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ABSTRACT 

Six studies of user interaction with on-line computer text editing 
systems are reported. Editing times for benchmark tasks on several 
editing systems were collected to gauge tlie range of performance across 
systems. Using the measured times, it was possible to predict when an 
editing system would outperform a typewriter. To model the user's 
behavior in greater detail, an information processing model of editing 
performance is proposed describing the user's "goals", "operators", 
"methods", and "selection rules". An important issue in such a model is 
how the model's accuracy depends on the grain of analysis. To find out, 
the model was recast at nine different levels of grain size and the 
accuracy of the different versions compared. From observations of users 
on several different systems, it was discovered that on each task, the 
users go through a similar sequence of task assimilation, target location, 
target modification, and verification. This concept of a "unit task cycle" 
was used to predict rough performance times for a proposed system prior 
to system specification. With respect to the target location part of a task, 
four devices for pointing to a target were compared and modeled. Using 
Fitts's Law, it is argued that the time for the best of these devices, the 
mouse, approaches the theoretical minimum. Finally, a Monte Carlo 
simulation model using gamma-distributed operator times and stochastic 
method selection rules is described, with which sequences of user actions, 
time per task, and the distribution of time can be predicted. 

The picture of user behavior that emerges from these studies is 
related to, but distinct from, behavior in classical problem-solving studies. 
The main difference is that the methods are almost certain of success. 
For any subproblem the user simply recalls the solution from his 
experience rather than working it out. Hence tliere is no search. Such 
behavior is expected to be found in many cognitive tasks in industrial 
work and daily life which people perform repetitively, tasks the report 
calls "routine cognitive skills." 
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Introduction 



Background 

Measurement Strategy 

A Typical User-System Dialogue 

Organization of the Thesis 



A person (a "user") sits before a computer terminal with a keyboard 
for input and a CRT display for output. In the computer is a text file. 
To the user's left is a manuscript, a printout of the text-file marked with 
modifications. The user, working through a computer program for text 
editing, is to effect each of the marked modifications in the text file, 
producing an updated file. Variations of the task occur with variations in 
the nature of the computer, the program (the "editor"), the terminal, the 
size of the manuscript, the type and number of corrections, the physical 
layout, and the familiarity of Uie user with the manuscript and the editor. 

What can we predict about the behavior of the user? What will be 
his sequence of actions? How long will it take him to perform them? 
What sorts of errors will he make? 

This thesis presents some answers to these and related questions. Why 
are they important? 

First, the questions are important in their own right. By the 1980s 
there will be a massive attempt to introduce systems of the sort described 
above into offices and clerical operations. The well-being of many 
workers as well as the technical success of the systems themselves will 
depend on how well the man-computer interface is designed. The 
quality of the interface will, in turn depend on the knowledge base and 
theoretical base from which such questions can be answered. 



2 CHAPTER 1 

Second, the manuscript editing task is similar to several other tasks. 
For one thing it is a man-computer task. Increasingly, the industrial 
world is coming to be populated by tasks of man-computer interaction: 
electronic cash-register terminals, management information systems, 
airline reservation systems, "word-processing" systems. For another 
thing, it is an instance of what might be called a direction- following task: 
a worker follows simple step-by-step instmctions to accomplish a larger 
end. Examples are a printer following proofreader's marks, a 
draughtsman implementing corrections indicated on a drawing, a 
technician building a Heathkit. Detailed study of the manuscript editing 
task is likely to facilitate work on at least two otlier large classes of tasks. 

Third, the study of the manuscript editing task provides a vehicle for 
studying a domain of behavior the thesis wHl call routine cognitive skills. 
The current attempt to understand man as a symbolic information 
processing system has concentrated on certain domains of behavior: 
recall and recognition tasks, which reveal the mechanisms of learning and 
the structure of short-term and long-term memory; discrete symbolic 
puzzles and mathematical exercises, which reveal the nature of search in 
problem solving; discrete symbolic induction tasks, which reveal 
elementary concept acquisition; tasks of elementary sentence 
comprehension, decision and arithmetic, which reveal the nature of the 
immediate processor; and simple tasks that occur in child development. 
There remain, however, important domains of behavior for which we do 
not yet have any reasonable detailed theory nor any verification that the 
theory of man as a symbolic information processor provides an 
appropriate theoretical base. Routine cognitive skills is an example of 
one such domain. 

Fourth, the study of the manuscript editing task helps lay the 
foundations for an applied information processing psychology. It is not 
yet quite possible for a psychologist to compute routinely the answers to 
questions needed by a system designer from a description of tlie editing 
task environment the way a bridge designer uses the laws of physics, 
lliere is as yet no psychological civil engineering. But there almost is. 
For elementary single element tasks like reading a pointer dial or 
reacting to a display of lights it is possible to compute reaction times and 
order of merit relationships for design alternatives (cf. Welford, 1968) as 
it is for tracking and feedback display tasks (cf. Poulton, 1974). The 
various and recent advances in infonnation processing psychology 
provide some of the metliodological and quantitative base from which 
such a field would draw, nie present thesis is the first part of a larger 
effort directed at making answers to such questions computable. 



s CHAPTER 1 3 

Background 

Despite its practical application and its apparent fruitflilness as a 
research problem, there have been few studies of manuscript editing. 
The first studies seem to have been done by Oren (1974, 1975). He 
derived models for the time to do editing on word-processing machines, 
a rather different class of systems from the ones studied in this thesis, 
but did not collect user data or validate the models. Riddle (1976) 
attempted to evaluate several editors by comparing them with a list of 
features, but again no observations were made on what users actually do. 
As far as is known, user behavior for manuscript editing has never been 
studied. 

While manuscript editing has not been studied directly, results from 
industrial engineering, human- factors, the study of man-machine systems, 
and psychology provide a context from which the present thesis has 
proceeded. 

Industrial engineers have produced a number of "predetermined time 
standards" (see, for example, Maynard, 1971). Measurements have been 
made of clerical tasks (Maynard, Aiken, and Lewis, 1960; Bim, Crossan, 
and Eastwood, 1961) and even mental operations (Quick, 1962). It is 
unfortunate that indications of the precision of the measurements, such as 
their standard deviation, have not been published with them. 
Furthermore, they are seriously undervalidated — for example, the data 
supporting the systems is unpublished— a partial consequence of their 
proprietary nature. What is most satisfying about the industrial 
engineering studies is the way in which times for novel sequences of 
operations may be derived from elementary operations. Part of the work 
in the thesis can be viewed as mental time and motion studies for editing 
systems. 

Whereas industrial engineering studies have tended to concentrate on 
sequences of operations and to ignore variance in their tabulated 
measures, human factors studies (for reviews, see Meister, 1976; 
McCormick, 1976) have tended to do comparisons between static 
alternatives and to emphasize the analysis of variance. The problems 
with this literature are the difficulty of extracting numbers on some 
normalized, natural scale (say "msec/char typed" from a table of "time to 
type a particular text") to use in predicting new results and the fact that 
few studies deal with sequences of behavior as opposed to a single 
operation. One of the most useful digests of human factors results has 
been the AIR data store (Payne and Altman, 1962) and its successors. 

"Man-machine studies" have been developed largely around the 
problems of aircraft and military operability. This Hterature, which exists 
largely in report form, outside of tlie journals, has been reviewed by 
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Parsons (1972) and Pew, Baron, Feehrer, and Miller (1977). Interesting 
models now undergoing development, are the Human Operator 
Simulator (Strieb, 1975) and SAINT (Seifert, Wortman, and Dukey, 
1977). 

The older literature of psychology contains a number of classical 
studies of occupational tasks: Bryan and Harter's (1897, 1899), Tulloss's 
(1918), and Taylor's (1943) studies of telegraphy, and Book's (1908) 
studies of typewriting are examples. Regrettably, such studies are today 
seldom found in the main line psychology journals such as Psychological 
Review and Cognitive Psychology. The information-processing psychology 
literature, however, contains two books from which the general approach 
of this thesis has been derived: Newell and Simon's Human Problem 
Solving (1972) and Welford's Fundamentals of Skill (1968). 

Human Problem Solving described problem solving behavior in terms 
of a person's goals, operators, and methods and how these reflect the 
structural demands of the problem. Solving a problem is described as a 
search by the problem solver through a space of knowledge states. The 
thesis uses these same descriptive elements for the behavior of users in 
the manuscript editing task. But, although editing is a symbolic task with 
a well-defined goal, there is no search, no problem space. The selection 
of methods is completely routine. Yet the task demands considerable 
cognitive involvement. The analysis of the manuscript editing task is a 
step toward a generalization of the Human Problem Solving theory to 
include both skill and problem solving behavior. 

Fundamentals of Skill reviewed and codified knowledge of skilled 
performance. The thesis uses parts of the model of human performance 
and applies its account of hand movement directly. But the main 
influence of die book has been as an example of how basic and applied 
research in psychology can be interwoven to the advantage of both. As 
Professor Welford put the issue in another applied book. Ageing and 
Human Skill: 

It has often been said, with considerable truth, that theoretical studies in 
psychology divorced from any applied aim quickly lose perspective and 
become confused with minutiae. On the other hand, it is certain that 
the applied aim is better served by an understanding of the fundamental 
changes ... rather than by a series of ad hoc studies ... . (Welford, 
1958; pi) 

The present thesis contains both investigations whose point is basic 
knowledge and attempted applications of that knowledge. 
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Measurement strategy 

Most of the experiments to be described involve the intensive analysis 
of a few users performing "naturalistic" tasks. The users are asked to do 
pretty much what they do in their daily work. For this reason, the 
laboratory results can be projected to the job with some confidence. But 
with a small number of subjects how can any claim be made for the 
reliability of the results? 

In order to conduct a scientific inquiry, there must be a source of 
repetition for the inductive method to work. One way to obtain 
repetition would be to run many users, making few measurements on 
each. Another way is to run a few users, but to record their behavior in 
detail. The reasons for preferring the latter technique in these studies 
are several: 

(1) It is the only way in which to observe the mechanisms which 
underlie the performance. Suppose there are two methods for 
performing a task and that 30% of the users do it one way and 70% do it 
the other way. If the times of the two groups are pooled, there is no 
easy way in which to understand the number that results. The pooled 
number represents the behavior of no one. It happens that sometimes 
users move the text on their terminals to a more convenient position 
before pointing at the word to be modified and sometimes they do not. 
Only by detailed observation can the obvious reason for this behavior be 
understood: they move the text when the target word is in the lower 1/3 
of the screen or completely off the bottom. 

(2) In many cases, the detailed study of few subjects is much more 
efficient. There are many ways in which systems can vary. They can use 
different names for the commands, different commands, different syntax, 
different terminal speed, response times, screen layouts, text selection 
schemes. Attempting to perform factorial experiments with several 
subjects per cell and a few measurements per subject leads to a 
combinatorial explosion in the number of conditions which must be run. 
Attempting to run a single factor or a few factors at a time risks missing 
very important interactjions and it is usually easier to comprehend that 
the user moves the text up when the next target is off the screen than it 
is to contemplate significance of the Editor-type X Target-distance X 
Task-type- triple interaction. It is often faster to observe a small number 
of users in detail, so as to understand the mechanisms at work, and then 
consider how these mechanisms might vary over a population of users. 

(3) Even if the above considerations didn't exist, it would still often 
be difficult to use a large number of users because the total world 
population of qualified users for a new or experimental system might be 
two people. Or the effort involved in running an experiment and 
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analyzing the data for an experimental system might be so large as to 
make the expense of larger scale experiments unreasonable. The 
techniques used in this thesis can be seen as one way in which to 
overcome the often cited "small A^ problem" in industrial man-machine 
research by making the maximum use of the experimental hours 
available. 

A Typical User-System Dialogue 

Before beginning the description of the studies it will be useful to 
bring the task into focus by describing a typical sequence of operations 
with an editing system. The user is seated as in Plate 1.1 with the 
manuscript to one side. She is about to use a typical hne-oriented editor 
called POET (Russell, 1973) to make the corrections indicated in the 
fragment of manuscript shown in Plate 1.2. 

The first marking on the manuscript indicates the word "great" has 
been mistyped in the manuscript as "geart". A typical dialogue between 
the system and the user is as follows. (Ellipsis "..." is used to shorten 
long typeouts by the system; the character # is the "prompt character" 
the system displays when it is ready for the next command; carriage 
returns typed by the user are indicated as < CR > ; the dialogue 
proceeds by adding to the bottom line of the display, "scrolling" the rest 
of the lines up and eventually off the top.) 

USER: "geart" (The user indicates the line he wants by 

typing between quotation marks some 
characters from the line) 

The system responds internally by making its "Current Line" be the next 
line it can find that contains the sequence of characters "geart." 

SYSTEM: # 

USER: / (This command requests the system to 

display its Current Line. The user issues the 
command to make sure the system is 
attending to the line he intended.) 

The system displays the Current Line on the bottom line of the display 
screen. 

SYSTEM: approach will need a geart deal of... 

Now the user decides to make the change by using the system's 
"substitute" command to substitute tlie characters "grea" for the 
character "gear." 
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Plate 1.1. Physical layout of manuscript editing task. 
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USER: s {Tells system to use the substitute command) 

SYSTEM: ubstitute (System completes typing of command 

name) 
USER: grea<CR> (Types text to be replaced followed by a 

carriage return) 
SYSTEM: (for) 
USER: gear<CR> 

SYSTEM: [OK] 

USER: <CR> 



(Types new text followed by a carriage 

return) 

(Asks user to confirm that the command is 

correct before system will execute it) 

(Types carriage return to confirm) 



The system makes the substitution as many times as it can and responds 
by printing the number of substitutions made — in this case one. 

SYSTEM: 1 
SYSTEM: if 

Again the user wishes to look at the Current Line to see if, in fact, the 
modification was made as he intended. 

USER: / 

SYSTEM: approach will need a great deal of... 

SYSTEM: » 

It was. The user now proceeds to the next task. 

In the next task, the word idiom is to be capitalized. The user 
decides he will do this by substituting Idi for idi. This time he uses the 
"linefeed method" rather than the "quoted string method" to locate the 
line to be changed. (Typing the linefeed key is abbreviated < LF > .) 



USER: 


<LF> (This is the command to print the next line 




and to make it be the Current Line) 


SYSTEM: 


idiom approach by definition involves... 


SYS'lEM: 


# 


USER: 


s 


SYSTEM: 


ubstitute 


USER: 


Idi<CR> 


SYSTEM: 


(for) 


USER: 


idi<CR> 


SYSTEM: 


[OK] 


USER: 


<CR> 


SYSTEM: 


1 


SYSTEM: 


# 


USER: 


/ 
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SYSTEM: Idiom approach by definition involves... 
SYSTEM: » 

The user proceeds thus through the rest of the manuscript, making the 
indicated modifications as he comes to them. 

The POET editor is typical of a large class of editors in current use 
design to be usable on a teletype hard-copy terminal. An example of a 
rather different sort of editor is RCG (a "display" editor inspired by the 
NLS editor at SRI; see Englebart and English, 1968). This editor uses a 
five-key chordset by which commands can be entered and a pointing 
device called a mouse which allows the user to move a cursor on the 
screen by rolling a small truck across he desk. With this system the user 
could perform the task as follows. 

USER: RC (Typed on chordset) 

SYSTEM: <Displays "Replace Character" at top of 

screen> 
USER: <Points to first char in "idiom" with mouse> 

<Presses button on mouse> 
SYSTEM: <Underlines character> 
USER: <Moves hands to keyboard> 

I (Typed on keyboard) 

<Moves left hand to chordset, right hand to 

mouse> 

<Presses button on mouse> 
SYSTEM: <Redisplays entire screen of text with 

change made> 

The more complex operators POET finds necessary to indicate the 
target text to be modified are replaced in RCG by a simpler pointing 
and select operation. 

Many other schemes for building an editor are, of course, available. 
Some will have effects on user performance. 

Organization of the Thesis 

The six studies described in the following chapters fall into a cyclical 
pattern of basic research and application. The first basic research study. 
Chapter 2, seeks to establish by using benchmarks how much difference 
in user performance there is between different computer editing systems 
and to learn something of the reasons for the differences. What is 
learned in Chapter 2 is then applied in Chapter 3 to a prediction 
problem chosen and refereed by another research institute: when will a 
computer editing system be faster than a typewriter? Chapter 4 returns 
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to basic research mode to investigate further the task exhibiting the most 
interesting differences in the benchmark studies: the manuscript editing 
task. This time the questions are whether an information processing 
model can be written for the manuscript editing task and what the 
appropriate grain size is for the model. Chapter 5 applies the concept of 
a "unit task" and the grain-size results gained from Chapter 4 to another 
problem: the prediction of how long it can be expected to take a user to 
perform various operations on a text processing system as a function of 
system design. The third time around the basic research-application 
cycle begins with Chapter 6 and Chapter 7. In Chapter 6, the focus is 
narrowed to a single sub-operation of the editing process: selecting a 
target piece of text on the screen of a CRT. In this case established 
theory from information-processing psychology is able to contribute a 
well-ordered account of the behavior. In Chapter 7 the empirical results 
from Chapter 2, the theoretical results of Chapter 4, and the empirical 
results from Chapter 6 are reduced to a running computer-simulation 
program for a display editor. 
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A good place to begin the study of editing is to understand the ranges 
of performance induced as a consequence of editing system design. The 
study reported in this chapter examines differences along one of the 
fundamental dimensions of performance: the speed with which a task 
can be performed. It does this by examining user performance on 
several text-editing benchmarks. 

Editors 

Five editing systems were chosen for study: POET, SOS, TECO, 
Editor Y, and RCG. ITiese five were chosen because of large differences 
in their designs and because all could be made to mn on local computing 
equipment. ITiree of the editors POET, SOS, and TECO are "teletype- 
type" editors. Tliey operate by printing out one line after another. Two 
of the editors are "display editors;" they show the user a picture of a 
page of text and readjust the picture after every editing modification. 
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POET, a version of QED (Deutsch and Lampson, 1967), is line- 
oriented but witli line numbers which are constantly changing because 
they specify relative position from the beginning of the file, 

SOS (Savitsky, 1969) is a line-oriented editor with fixed line-numbers. 

TECO is a character-oriented editor originally written by Daniel L. 
Murphy at Project MAC, M. I. T. TECO is distinguished by its large 
number of short commands, its programability, and by the fanaticism it 
inspires among systems programmers. The experiment used the TENEX 
TECO version of the editor (Bolt, Beraneck and Newman, 1973). 

Editor Y is an experimental display-oriented editor which uses the 
mouse (English and Englebart, 1967) for pointing at the display. In 
command structure it is rather similar to POET. 

RCG is a display -oriented editor written by William Duvall; it is a 
descendent of the NLS editor (Englebart and English, 1968). Tliis editor 
also uses a mouse for pointing and a five-paddle chord device for input 
of commands. 

Benchmark Tasks 

The editors were compared by testing users' performances on four 
benchmark tasks: a Letter Typing, Manuscript Modification, Text 
Assembly, and Table Typing. 

In the Letter Typing task the user typed a letter from a corrected 
manuscript. 

In the Manuscript Modification task the user was given a file 
containing a memo and a listing of that file with marked modifications. 
His task was to change the file as specified by the markings. 

For the Text Assembly Task the user was to assemble a report by 
combining tliree previously written paragraphs stored on separate files 
with another paragraph to be typed in from a manuscript. 

In the Table Typing task, the user typed a simple columnar table with 
headings. 

The materials given to the users for these tasks are reproduced in 
Appendix 2 at the end of tlie chapter. 



2.1 METHOD 

Subjects 

Subjects were 10 secretaries and professionals. All were experienced 
and expert users with the editors on which they were tested. Most users 
had used the system more than a year and had last used the system 
within the past week. About a quarter of the users had programmed or 
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maintained tlie system. 

Design 

Because few users were experts in more than one or two of these 
editors and to avoid the possibihty of practice effects from repeated 
exposure to the tasks, a mixed design was used with the editor as a 
between subjects variable. Each user was tested on a single editor. Each 
editor was tested using three users. Three is the smallest number which 
would give some notion of interuser variability and the largest for which 
experts on Uie different editors were available. Only one user was 
measured using SOS because of its similarity to POET. Each of the four 
tasks was done with the POET, SOS, TECO, and RCG editors. 
Performance on Editor Y for the Manuscript Modification task was 
measured at a later date. As a baseline against which to measure 
performance, one subject was also measured performing the tasks using 
an IBM Selectric II typewriter. 

Procedure and stimuli 

Users were seated in front of a 6 line/sec CRT terminal as shown in 
Plate 1.1. A session went as follows: The user was first given a set of 
general instmctions urging him to work as fast as possible without 
making too many errors and stressing that the editor, not the user's 
abilities was under examination. He was given a warmup task exercising 
the editor, then each of the four tasks in the order (1) Letter Typing, (2) 
Manuscript Modification, (3) Text Assembly, (4) Table Typing. The 
stimulus materials and instructions for each task were bound in a 
notebook and the subject was allowed to proceed through the tasks at his 
own pace. ITie experimental session was recorded on video tape, a video 
clock superimposing the time to a hundredth of a second on each video 
frame. 



2.2 TIME DIFFERENCES AMONG EDITORS 

Table 2.1 shows the time required for each task by users on the 
different editing systems. Performance on the Letter Typing task mainly 
reflected the typing abilities of the users. Performance on the Table 
Typing task mainly reflected the ingenuity of the users: methods varied 
from typing in the lines directly using fixed tabs provided by the system 
to making many copies of the first line in the table and substituting for 
the entries. In the Manuscript Modification task, however, there was a 
factor of 2.3 difference between the time required by users of the slowest 
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and the fastest editing system. In the Text Assembly task the ratio 
between the slowest and fastest system was 1.6. This finding of 
differences among systems for benchmarks requiring extensive text 
modifications, but not for benchmarks dominated by typing or 
idiosyncratic user methods is consonant with other (unpublished) 
benchmarking studies (Little, Note 1). 



TABLE 2.1 
Times for Benchmark Tasks (sec) 



Editor 




Task 








Letter 


Manuscript 


Table 


Text 




Typing 
229 


Modification 


Typing 


Assembly 


Typewriter 


901 


483 


489 


POET 


238 ±28^ 


280±71 


244±21 


160±65 


SOS 


315 


241 


234 


147 


TECO 


252±25 


203 ±42 


283±41 


131±15 


Editor Y 


- 


133±27 


- 


- 


RCG 


224±4 


122±16 


306±54 


102±32 



* ± indicates standard deviation 



It was much faster to use any of the editing systems than to use the 
typewriter on all tasks except for the Letter Typing task. Since it is 
assumed modifications are made to the text by retyping it, the time taken 
by the typewriter depends on the length of the text. But the time 
required by the editing systems depends on the number of modifications. 
Hence the ratio of 7.4 between the time to use a typewriter and the time 
to use the fastest editor would vary with changes in the length of the text 
and the number of modifications. 

For botli the Manuscript Modification task and the Text Assembly 
task, the editors performed in the same order from slowest to fastest: 
POET, TECO, Editor Y, and RCG. The display-oriented editors, Editor 
Y and RCG, as a group took about half as long as the scrolling editors 
to perfomi the Manuscript Modification task. 

The Manuscript Modification task, then, bears closer examination as a 
place where the design of the editors under review has made a difference 
in performance time. Table 2.2 displays the performance of tlie users in 
greater detail for this task. The manuscript to be edited is a one page 
letter on which 12 modifications were indicated. Each row in the main 
part of the table is tlie time required for one of the users to make the 
change. Cells marked "-" in Uie table indicate the user skipped the 
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TABLE 2.2 
Time per Modification for Manuscript Modification Task 







POET 




- 


SOS 

S12 




TECO 






EDITOR-Y 






RCG 






MOD 


S4 


S6 


S13 


S18 


319 


S20 


316 


S30 


S31 


S12 


S14 


315 




Tl 


22.5 


35.4 


15.4 




11.0 


11.7 


45.5* 


10.9 




5.9 


11.5 


19.8 




8.4 


8.6 


8.8 




T2 


14.5 


16.3 


7.7 




10.1 


16.3 


9.6 


8.8 




2.9 


4.9 


5.2 




4.4 


6.4* 


4.4 




T3 


24.3* 


20.0 


9.2 




15.6 


8.9 


9.5 


11.6 




13.1 


11.2 


5.6 




5.0 


6.2 


3.9 




T4 


23.6 


59.8* 


23.2 




29.9* 


26.0* 


14.1 


13.0 




8.5 


12.0 


18.9 




21.8 


9.5 


4.5 




T5 


16.3 


16.5 


35.7 




19.5* 


10.8 


18.8* 


15.7 




7.5 


11.4 


11.0 




5.9 


7.7 


29.1* 




T6 


10.6 


14.5 


20.2 




26.5* 


7.9 


9.3 


10.9 




10.4 


7.6 


43.4 




5.7 


9.0 


7.9 




T7 


18.7 


11.4 


13.0 




23.2* 


6.2 


8.8 


9.1 




7.8 


11.8 


9.4 




4.9 


5.0 


9.8 




T8 


14.2 


14.7 


14.0 




9.6 


29.1* 


12.2 


12.3 




4.8 


8.3 


14.9* 




4.0 


4.8 


12.6 




T9 


- 


• 


8.7 




8.2 


12.6 


11.5 


7.8 




8.0 


- 


5.8 




7.5 


1.9 


10.4 




TIO 


- 


- 


12.0 




42.6* 


18.6 


17.6 


19.1 




5.4 


9.2 


10.5 




5.4* 


- 


7.9 




Til 


10.8 


14.3 


34.6 




8.0 


11.3 


11.9 


5.4 




7.4 


6.0 






5.3 


3.6 


9.5 




T12 


11.4 


13.3 


9.2 




10.1* 


7.6 


10.9 


5.2 




4.6 


7.4 


9.8 




10.5 


7.2 


7.9 




All Tasks 




Mean^ 






Mean^ 






Mean^ 






Mean* 






Mean 


16.7 


21.6 


16.9 


(18.5) 


17.9 


13.9 


15.0 


10.8 (13.1) 


7.2 


9.2 


14.0 


(10.1) 


7.4 


6.3 


9.7 


(7.8) 


SD 


5.3 


15.0 


9.7 


(2.7) 


10.8 


7.3 


10.1 


4.0 


(2.1) 


2.8 


2.5 


10.9 


(3.5) 


4.9 


2.4 


6.6 


(1.7) 


N 


10 


10 


12 


(3) 


12 


12 


12 


15 


(3) 


12 


11 


11 


(3) 


12 


11 


12 


(3) 


Error- freee Tasks Only 
































Mean 


15.9 


17.4 


16.9 


(16.7) 


10.4 


11.2 


11.5 


10.8 (11.2) 


7.2 


9.2 


13.9 


(10.1) 


7.5 


6.0 


8.0 


(7.2) 


SD 


4.9 


7.1 


9.7 


(0.8) 


8.8 


3.9 


2.7 


4.0 


(0.4) 


2.8 


2.5 


11.5 


(3.4) 


5.4 


2.5 


2.7 


(1.0) 


N 


9 


9 


12 


(3) 


6 


10 


10 


12 


(3) 


12 


11 


10 


(3) 


10 


9 


11 


(3) 



Computed over users, one number per user 
* Indicates task on which user made an error 
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modification. Those modifications on which the user made an error (for 
example, substituting for the wrong string) are marked with an *. The 
effect of editing system design on time performance is best seen by 
examining the error-free "correct modifications." This is not to say that 
errors are unimportant, only that they require a separate, complementary 
analysis (with larger quantities of data). A single modification on which 
an error has occurred can, if the error requires a very long time to 
correct, seriously distort the editor comparisons. The average time per 
modification for each user is summarized at the bottom of the table. 
This calculation is made both for all modifications in the table and again 
for only those modifications on which the user did not make errors. 
Overall, the errors increase the time per modification by about 9% (range 
0% to 24%). The analyses that follow, therefore, use only the error-free 
modifications from Table 2.2. 

Examining the error-free tasks in Table 2.2 there is still a ratio of 2.3 
between the slowest POET and the fastest RCG. This compares with an 
average ratio of 1.2 between the slowest and fastest user within each 
editor. Thus, with respect to speed, differences in editor design are 
considerably more important than differences among expert users. 

Since there are small ways in which the fastest editor RCG could be 
improved and editors are known to exist which are even slower than 
POET, it is probably justified to say, as a rough statement, that the 
design of an editor makes a factor of 3 difference in the time to make 
typical modifications to a manuscript a manuscript. 



2.3 SOURCES OF THE TIME DIFFERENCES 

What is the source of the observed differences in the time to use the 
editors? 

One way to look at the differences among editors is to consider how 
much work the user has to do in order to accomplish a modification to 
the text and one index of work is the number of keystrokes required. In 
Figure 2.1 the time per. modification is plotted against the keystrokes per 
modification (for the user with the lowest error rates in each editor: S4, 
S18, S30, S14). POET, SOS, TECO, and RCG fall exactly on a line 
essentially through the origin {T^^^ = (0.26) + 0.57 A^^tej^rroites ^^'^ ^^ 
> .999, 5g = 0.12 sec). Editor Y, however, takes four seconds per 
modification (about a factor of 2) longer than expected. It is possible 
that counting keystrokes does not capture some important part of the 
interaction. More detailed comparison of the behavior of users using 
Editor Y suggests tliat the users spent more time tlian expected in getting 
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Figure 2.1. Mean time per modification for editors as a functiion of the keystrokes 
per modification 



the task with Editor Y and that the time required by the numerous 
pointing operations needs to be considered. A more definitive 
explanation requires additional experimentation. 



lA CONCLUSION 

Tlie design of an editor makes roughly a factor of 3 difference in the 
time to edit a manuscript. 

Tlie number of keystrokes which must be typed to effect a 
modification is an important factor in the lime it takes to perform the 
modification. 
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An approximation to the time per modification can be computed by 
multiplying the number of keystrokes by 0.6 sec/keystroke. 

Of course, as witli any such simple method, some caution is in order. 
There may be other features of the system, as in the case of Editor Y, 
which have to be taken into account. Very brief command sequences 
may also be harder to remember and thus increase substantially system 
training time. 



REFERENCE NOTE 

1. Little, L. Personal communication on word processing benchmark 
studies conducted by him at Lawrence Livermore Laboratories. 



APPENDIX: BENCHMARK TASKS 

The following pages contain the instructions and material to be edited 
which was given to the users in this experiment. 
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GENERAL INSTRUCTIONS 

Please read these instructions to yourself as I read 
them aloud. 

The purpose of this experiment is to assess the ease with 
which certain common tasks can be performed with the 
current editor. It is not a test of your abilities. It 
is a test of the editor. 

You should try to perform the following tasks at your 
usual working speed. That is, I want you to work as 
swiftly as you can without making many mistakes. Should 
you happen to make a mistake, simply correct it as 
quickly as you can and go on. 

If it is natural for you to do so, and if you can do it 
without giving it any thought, you are encouraged to 
talk aloud as you work, saying what you are doing. 
However, do not "explain" or "justify" your actions to 
me and do not "introspect" on what you think you are 
doing as these things take time and will interfere with 
the tasks you are doing. 

Any questions? 
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INSTRUCTIONS FOR TASK 1 



The next page is the text of a letter on which some 
corrections have been marked. Your task is to type a new 
copy of the letter and to save it on a file called LETl. 
You need not indent the precise number of columns 
present in the text so long as the appearance is similar. 
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July 7. igy-** 



Manager, Credit Department 
Johnstone's Department Store 

Gentlemen: 

On April 7 Miss Heather Smith, account number 86153, 
returned for credit a clock radio that she purchased from you on 
April 2. 

Miss Smith's July statement does not show this credit. She now 
has in her possession the credit slip the clerk gave her at the 
time the radio was returned. Miss Smith would appreciate it if 
you would verify the credit and send her a corrected statement. 
The price was $57.20, tax included. 

In the meantime, please find Miss Smith's check for $420.91, 
which is the amount of the statement less the price of the 
returned merchandise. 

Sincerely yours , 



Benjamin M. Ink 
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INSTRUCTIONS FOR TASK 2 

On the next page is the text of a memo on which several 
corrections have been indicated. This memo has been 
stored in your machine under the name MEMl. You are to 
use your editor to make the indicated changes, storing 
the corrected memo under the name MEMIA. 
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To: John Ingram (Northeast) Date: June 22, 1976 

Betty Grailing (Southern) 

From: Dwaard Finwald 

Subject: DuluLi an Sales Activity Promotion 

« 

The attached information popp i ooentg a DATATRAN promotion proposed by 
the DATATRAN Business Center. 

The program represents a sales activity driven strategy to attain the 
DATATRAN operating plan for 1976, without detracting from ;>«fl^other 
products demands. 

The program is comprised of two phases: 

1. Phase I affects all salesmen including Account Managers and ASR's Sales 
Managers and BSTM's, and it directed to^DATATRAN sales order activity. 

2. Phase II is aimed at attaining net installations/net add plan activity for 
the third quarter for DATATRAN and is specifically geared toward the 
branch manager, branch sales manager/branch sales planning manager, and 
n^i udua l . iui ' i Engineers. 

The program will be administered by Field Operations in conjunction with 
the A. C. Abercrombie Company. 

The program has been reviewed and improved in concept by Sherwood 
Anderson and Elwood Reisling, and is reported to you for your review and 
comments. 

The program is planned for implementation May 1st, with Phase I ending 
August -3<Tth, while Phase II will be extended through the end of October. 

Individual awards and qualification levels are contained in the attached 
package. 



I would appreciate your review of this material and comments/critique prior 
to Friday, March Vj^ u^ S^l^e 

In the event that other B'usiness Center or individual product groups plan a 
promotion during » j i iii i 7 ui» period, the attached promotion will be adjusted 
to reflect a balance performance for attainment on behalf of the sales 
yflmanagers/branch oalo a managon ft bpancrn 'managers . 

In that connection, I mju l J keep you apprised of any additional 
developments. ^ ^ 

Thank you for your assistance, and I bo! ^ owe that activity of this nature is 
necessary to ensure a successful third quarter for DATATRAN devices as 
well as all products. 

DF/br 
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INSTRUCTIONS FOR TASK 3 

In this task you are first to type in the text on the next 
page, then assemble the rest of the manuscript from 
paragraphs pre-stored on your machine. You will also 
have to add a blank line between the paragraphs. The 
paragraphs to be assembled are stored as files 

TICS 
IDPS 
POGOS 

and they should be assembled together on the file OUT in 
that order. 
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During the past six months, improvement of the systems 
software has continued and software for a second phase 
document processing/communications has been specified 
and programming begun. Major accomplishments include: 
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INSTRUCTIONS FOR TASK 4 
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On the next page is a table. Please type the heading and 
indicate portions of the table and file it under TABl. 
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CRITICAL VALUES FOR THE KOLMOGOROV-SMIRNOV TEST OF GOODNESS OF FIT 




iRiyllie tabultttod vohi e. 
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APPLICATION: 
Predicting When an Editing 

System 
is Better Than a Typewriter 



3.1 MODEL 

Brief description of the RAND experiment 
Basic editing and typing models 
Length crossover point L^ 
Prediction of L for RAND experiment 

3.2 CONFIDENCE INTERVALS 

Confidence intervals for L 

Confidence intervals for p 

3.3 SENSITIVITY ANALYSIS 

Length crossover density L 

Density crossover point p 
Total editing time T 
Total typing time T^ 

3.4 RESULTS AND DISCUSSION 

Typing model 
Editing model 
Instability of L 

3.5 CONCLUSIONS 



In the spring of 1975 the Applied InfoiTnation Processing Psychology 
Project was challenged by Ivan Sutherland and Frederick Blackwell at 
the RAND Coiporation in Santa Monica to predict, in advance, the 
results of an experiment to be performed there. 



30 CHAPTER 3 

It is sometimes said that because it takes less time to set up a 
typewriter than a computer text editor, tlie former is better for short jobs 
and the latter for long jobs. The basic purpose of their experiment was 
to find the "crossover point," the length of text where the computer's 
speed in making corrections began to outweight the typewriter's ease of 
setup. 

This chapter presents the model which arose in response to this 
challenge. The challenge provided the first opportunity to apply the 
results of Chapter 2 to the prediction of user perfomiance. 



3.1 MODEL 

Brief Description of the RAND Experiment 

The experiment compared the time required to make modifications to 
five texts of varying lengths using an electric typewriter with the time to 
make the modifications using the WYLBUR editing system (Stanford, 
1975) running on a time-shared computer. The subjects were 12 
professional secretaries, each a proficient user of WYLBUR and of a 
typewriter. Each user edited all five texts twice, once with the typewriter, 
once with WYLBUR. Half the users used the typewriter first, half the 
editor. The order in which the texts were edited was varied 
systematically. 

Basic Typing and Editing Models 

The time T^ to produce a new copy of the same manuscript using a 
typewriter depends only on the length of the manuscript and the setup 
time of the typewriter. 

^/ = ^5/ + ^^l (3-1) 



where 



T^j = Time to set up typewriter (in sec) 
L = Length of the text (in lines). 
Tj = Time to type a line (sec) 



The time T^ to perform a manuscript editing task, on the other 
hand, depends on the number of modifications. Suppose that every 
modification with an editing system took the same amount of time T^ 
to perform. Suppose furthermore tliat secondary effects such as operator 
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fatigue and time to turn the page were absent or negligible. Tlien the 
T would be given by 



Te = T^ + N^ r„ (3.2) 



where 



Ty^ = Time to set up the editor (in sec) 

A^^ = Number of modifications ("unit tasks") to be made. 

Ty =: Time to do one editing modification (in sec). 

Expressing Equation 3.2 in terms of the modification density, p = 
NJL, makes it more comparable to Equation 3.1: 

Length Crossover Point L^ 

If the typewriter is faster to set up {T^^ < T^^, but the editor is faster 
for making modifications (pT^^ < T^), then there exists some document 
length L^ called the length crossover point such that for 

L > ^c ^^ editor is faster; 
L < ^c ^^ typewriter is faster. 

To find L^ we use Equation 3.2 and Equation 3.3. The time for the 
editor and the typewriter will be the same when 

'Tse + Pi^ju = ^s, + ^Jl- 

Tliat is, 

^c = <T^'TJ/{T^-pT^). (3.4) 

Thus, for ' 

L > (r^g - T^y{Ti - pTJ, the editor is faster; for 
L < (T^^ - T^yiTi - pTJ, the typewriter is faster. 

Density Crossover Point p^ 

Similarly there exists a certain density p^ called the density crossover 
point such that tor 
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p < p^, the editor is faster; for 
p > p^, the typewriter is faster. 

Solving Equation 3.4 for p gives 

Pc = V^u - (r,, - 7-„)/Lr^ (3.5) 

Prediction of L^ for RAND experiment 

What prediction can be made for the RAND editing experiment? 
First we need to estimate the parameters of the above equations as best 
we can. 

T^^ In connection wjth the experiment of Chapter 2 several 
measurements were avialable for the setup times for a 
typewriter: 20 sec, 32 sec, 19 sec (mean 23.7 sec), which 
rounds to 



T^^ = 24 sec. 

T^^ Also in connection with the experiment of Chapter 2 
measurements were collected of the setup time for 
POET and SOS editors (both of which resemble 
WYLBUR): 14 sec, 16 sec, 13 sec, 5 sec, 13 sec (mean 
12.2 sec). Add to that about 25 sec to log into the 
computer (measured time to telephone a local computer 
and log into the TENEX operating system). 



r = 37 sec 
se 

Measurements of the five texts used in the RAND 
experiment yielded tlie values of p listed in Table 3.1. 
The individual tasks varied from p = 0.38 to p = 
0.64 with an average of 

p =0.58 mod/line. 



u 



T^^ Measurements made in connection with the experiment 
in Chapter 2 on the POET and SOS editors gave an 
average time per modification of 
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TABLE 3.1 
Measured Parameters for Texts Used in Experiment 
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Tl 



Text 



T2 T3 T4 T5 



All 



L (lines) 
A^^ (mods) 
p (mods/line) 



4 10 21 26 90 151 
2 6 8 14 58 88 

0.50 0.60 0.38 0.54 0.64 0.58 



T^ = 20 sec^. 



The average typing rate for POET users in the last 
chapter was 0.22 sec/char. Since there are 63 char/line, 

T, = 14 sec/line 



Notice that in the time it takes to make one correction 
with WYLBUR, the user could have typed L - TJT^ 
= (20 sec)/(14 sec/line) = 1.4 lines = 16 words. It is 
evident that, contrary to popular opinion, it is much 
more prudent to type more slowly but carefully on an 
editing system than it is to type at high speed and 
correct the errors later. 

The length crossover point L is predicted to be 



^. = 



(7;, - T^yiT, - pr„) 
= (37 - 24)/(14 - 0.58 X 
= 5.4 lines. 



20) 



Tlie density crossover point p^ is predicted to be 



= 0.70 - 0.55/i. 
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As L -*• 00, p^ -♦ 0.7 modifications/line. Another way of puting 
this result is: if there is more than one modification to be done every 
(0.7)"^ = 1.4 lines, then it is better to retype the text from scratch. 

Plotting the time to modify a text predicted by Equation 3.2 and 
Equation 3.3 as a function of the length of tlie text, it is apparent 
(Figure 3.1) that the editor beats tlie typewriter immediately for almost 
any task but addressing envelopes. But Figure 3.1 reveals that as the 
length of the text increases the editor does not continue to increase its 
superiority as much as might be expected (the bump in the WYLBUR 
curve comes from the low density for task T3). Why? 

The answer is that the density p = 0.58 chosen for the experiment 
just happens to be near the critical crossover density pj Had the 
experiment varied p, one text at the critical value would not have been 
a problem, but as each of the texts sits near this critical point, local 
fluctuations in T^ or p will push the value of Tj - pT^^ back and 
forth. Another way to display the model's prediction is to plot the 
density crossover point p^ as a function of text length L (Figure 3.2) 
using Equation 3.5. Note how near the tasks are to the crossover density 
line. Because all of the texts sit near the density crossover line, it can be 
predicted that the results of the RAND experiment will be equivocal: the 
length crossover point L^ will not be well-defined. 

What about predictions at other values of p? The prediction of task 
time as a function of length of text for different values of p is plotted 
in Figure 3.3. The typewriter either wins or loses immediately. This is 
true because the difference in time required to set up for a typewriter 
and to set up for WYLBUR is (for task lengths > 5 lines) a small 
percentage of time required to do the task. 



3.2 CONFIDENCE INTERVALS 

Recall that the analysis so far is entirely a prediction for the RAND 
experiment. Only the stimulus materials were available. All other 
parameters, including the typing rates of the users, were taken from other 
pre-existing experiments by analogy. Thus the parameters from which 
these predictions were produced are uncertainly determined. To what 
extent are tlie conclusions dependent on tlie goodness of the parameter 
estimates? One way to detemiine the effect of this uncertainty is to 
compute some sort of confidence intervals for the tlie predictions. 
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TABLE 3.2 



Assumed Probability Distribution 
FOR Estimating Confidence Intervals 







Probability 






0.15 


0.70 


0.15 


Tst (sec) 
Ti (sec/line) 

Tse (sec) 
T^ (sec/mod) 


15 
10 

3 
15 


24 
14 

37 
20 


40 
20 

60 
40 



Confidence Intervals for L^, 

While we do not have a formal probability distribution for the likely 
values of the parameters, we might claim to have some knowledge about 
the distrubution. For example, from observing a few people set up a 
typewriter, from examining the standard deviation of the times measured 
for this activity, it can be concluded with some confidence that the 
probability that T^^ > 10^ sec is vanishingly small. In Table 3.2 a high 
value and a low value has been estimated for each parameter and each 
given a probability of 0.15. From these estimated probabihty density 
functions it is possible to compute the probability density functions for 
Equation 3.2 to Equation 3.5. 

The results of this rather lengthy computation are plotted in Figure 
3.4. For texts T3, T4, and T5, WYLBUR should surely be faster. But 
for Tl and T2 it is more difficult to predict. 

Confidence Intervals for p^ 

From a similar computation. Figure 3.5 plots confidence intervals for 
p^. The figure shows there is quite some uncertainty in the asymptote 
for the curve. This comes about because the asymptote is given by 
Tj/T^. If the numerator were to get larger by a factor of two and the 
denominator smaller by a factor of two it would throw the asymptote off 
by a factor of four. The curve also illustrates tliat the transient part of 
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Equation 3.5 is of negligible consequence for practical purposes since its 
influence is small for texts larger than 10 lines. In fact its influence is 
quite modest for texts larger than 5 lines. All of the texts lie in the ±lor 
confidence band. Thus predictions of which side of the critical density 
line the texts lie on are very uncertain. 
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3.3 SENSITIVITY ANALYSIS 

Another way in which to determine tlie effect of uncertainties in the 
parameters is to see how sensitive the predictions of the equations are to 
jiggles in the parameter values. 

Length Crossover Density {Equation 3.4) 

Let A be the point < p, T^, T^^, T^^, T^> = < 0.58 mod/line, 20 
sec, 24 sec, 37 sec, 14 sec > in the space of all possible values of the 
parameters in Equation 3,4. Let B = < /»', TJ, T^J, T^^\ Te' > be 
some other point in that space. Using a Taylor expansion about A, L^ 
at B can be approximated by 



L^.(B) :^ />,(A) + ((B - A) • V)LXB) 
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where V is the gradient operator which indicates that partial derivaties 
should be computed with respect to each of the components of B. 

Writing L^ as shorthand for L^(B) and just L^ for LJ^k), 

l; ^ l^ ^ [{p - p)d/df> + {t; - Tjd/dT^ 

+ (7/ - Ti)d/dT} L^ 
= L^ + IdL/dp] (p- - p) 

+ [dL/dTj (r; - Tj 

+ [dL/dTJ (TJ - TJ 
+ [dL/dTJ (T^; - TJ 
+ [BL/dTji (t; - Tp. 

In order to normalize the magnitudes of the coefficients and the results, 
we express this equation in a ratio form: 

- [p/L^ dl/dp] (p* - p)/p 

+ [T/L^ dL/dTJ (r; - TJ/T^ 
+ [V^c ^V^T-J (7-,; - T^,)/T^^ 

+ [T/L^ dL/dp] (T; - T^VTi 
or, using 8x for (x' - x)/x, 

8L^ ~ [p/L^ dL/dp] 8p + [T^L^ dL/dTJ 8T^ . 

+ [T/L^ dL/dT} STj. 

Evaluating the derivatives and substituting {T^^ - T^^ViT^ ~ pTJ for L^ 
gives 

Equation 3.6 expresses relative changes in L^ as a linear combination of 
relative changes in the parameters of Equation 3.4. The percentage 
change in L^ is approximated as the sum of the percentage changes due 
to each variable. 'Hie relative sensitivity of predicted /. . due to the 
different parameters may thus be assessed directly from the relative size 
of the coefficients. At p = 0.6 Equation 3.6 becomes 
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8L^ = 6.00 8f> + 6.00 8T^ + 2.85 8T^^ 

- 1.85 8T^^ - 7.00 STj. 

A 1% error in T^ will produce a 7% error in L^. The values of the 
coefficients for other values of p are plotted in Figure 3.6, as are those of 
the three following equations. '^The value of L^ is more sensitive to 
changes in T^, />, and T^ than to changes in T^^ and T^^, the ostensible 
parameters of interest. The sensitivity analysis makes it quite clear (1) 
that the prediction of L^ = 5 lines from the model is not robust over 
changes in the parameters and that (2) it will be difficult to maintain 
adequate control over the variables in the experiment at this level of p. 
Considerable variance in the measured value of L^ is predicted. The 
figure shows that the coefficients for 8p, 8T^, and 87^ are all very large 
in the region between p = 0.06 and p = 0.08. Conversely, had the 
experiment chosen p = 0.2, then 

8L^ = 0.40 5p + 0.40 8T^^ + 2.85 8T^^ 

- 1.85 8T^f - 1.40 8T^ 

In this case L^ would have been much less affected by the parameters 
other than T^^ and T^^ 

Density Crossover Point p^ {Equation 3.5) 
Proceding similarly for Equation 3.5, 

+ [L/p^ dp/dL] 8L + [T/p^ dp/dTJ ST^ 
+ [T/p^ dp/dT} 8Ti 

= [1 - {T^^ - LT^/TJ^ 57;, 
+ {{T^^ - LT^/T^^ - l]-l 57;, 

+ fl ^ ^^se ' Ts)/LT}-^ 8Ti (3.7) 

At L = 20 lines. Equation 3.7 becomes 

5p^ = 0.09 8T^j - 0.13 8T^^ + 0.05 8L 

- 1.00 8T^ + 1.05 57^ 

A 1% change in either T^ or T^ will produce a 1% change in p^. A 
1% change in tlie other parameters produces only a negligible change in 
p^. For texts of reasonable length ( > 10 lines), p^ will depend mainly 
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on T^ and 7}. 



Total Editing Time T^ (Equation 3.1) 

Sr^ - [T^/T^ dT/dTJ 8T^^ + [p/T^ dT/dp] Sp 
+ [TJT^ dT/dTJ 8T^ + [L/T^ BT/dL] SL 

= (1 + pLT/TJ-^ ST^^ 

+ (1 + TJpLT^ (Sp + 8T^ + 8L) (3.8) 

At L = 20 lines, p = 0.6 mod/line, 

ST^ = 0.13 8T^^ + 0.87 6p + 0.87 8T^ 
+ 0.87 8L. 

Sensitivity to T^^ fades quickly 'as L increases. A 1% change in the other 
arguments produces a little less than a 1% change in T^. 

Total Typing Time T^ (Equation 3.2) 

8T^ c=i [TJT^ dT/dTJ 8T^^ + [L/T^ dT/dL] 8L 
+ [T/T^ dT/dT\ 8Ti 

= (1 + ^T/TJ^ 57;, 

+ (1 + TJLT)-'^{8L + 8T) (3.9) 

At L = 20 lines, 

8Tj = 0.08 8T^^ + 0.92 5L + 0.92 8Tf 

Again sensitivity to 7^, the setup time fades quickly as L increases. And 
again a 1% change in the other parameters produces a little less than a 
1% change in T^. 

What the results of the confidence interval and sensitity analyses tells us 
is that while it may be possible to predict the value of L^ functionally, 
that is, to produce an e'quation whose evaluation will give a reasonable 
value for L^, it is not possible to predict the value of L^ numerically with 
any certainty on this set of texts because they are all set so near to p^. 
Small errors in the parameter values will cause large errors in the 
predictions. The analyses tell us further tliat the experiment is not likely 
to produce a well defined value of p^ against which to compare a 
prediction. On the other hand, predictions of total time to process each 
text are likely to be reasonable and, in fact, to depend very little on the 
setup times of the editor or the typewriter. 
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By agreement the predictions above were mailed to RAND at the 
same time in which they mailed the results from the experiment. At the 
time at which the exchange took place, the data from only 8 subjects had 
been tabulated. The following analyses are based on the results of those 
8 subjects. 

Figure 3.7 shows the time for each task as a function of the length of 
the task for both the typewriter and for WYLBUR and is comparable to 
the prediction in Figure 3.1, As predicted, the crossover point L^ was 
not well-defined. Connecting the mean observed times produces three 
crossover points. The heavy overlap of the error bars makes it unlikely 
that the times for texts T2, T3, and T5 are reliably different from one 
another. 





TABLE 3.3 
Parameter Estimates 






Pred. 


Obs. 


%Dif 


Tst 


(sec) 24 
(sec/line) 14 

(sec) 37 
(sec/mod) 20 


5 
18 

179 
16 


-85% 
22% 

649% 
-20% 



How good were the simple models of typing and editing in Equation 
3.2 and Equation 3.3? The comparison needs to be made in two ways. 
First, how good were the models at predicting the result in advance of 
any knowledge about the outcome? ITiis zero-parameter prediction is 
usual in practice where, as in this case, good values for the parameters 
are not known. Second, how g(H)d were the models at predicting the 
result given knowledge of the parameter values? 'lliis two-parameter 
prediction (two values must be estimated from the data) allows an 
evaluation to be made of the accuracy of the functional form of the 
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TABLE 3.4 

Comparison of Typing Model 
T^ = T^^ + LTg WITH Data 







SOURCES OF ERROR 






PARAMETERS 


( 


MODEL 


TOTAL 


TEXT 


T 

^ St 


Te 


SubTotal 




Tl 


+27% 


-18% 


9% 


0% 


9% 


T2 


+11% 


-19% 


-8% 


-2% 


-10% 


T3 


+4% 


-20% 


-16% 


-5% 


-20% 


T4 


+4% 


-20% 


-16% 


+3% 


-13% 


T5 


m% 


-20% 


-20% 


0% 


-20% 


Mean 


9% 


-19% 


-10% 


-1% 


-11% 



model. It allows us to partition the blame for errors in the model 
between errors in estimating the parameters and errors in the form of the 
equations. In order to make two-parameter predictions, estimates of the 
parameters were made from regressions on the RAND data. A 
comparison between the parameters estimated in this way and the values 
assumed for making the predictions is given in Table 3.3. The estimates 
were farthest off (649% and 86%) for the setup times T^^ and T^^. 
They were much closer (22% and 20%) for the rate parameters T^ and 

Typing Model 

Figure 3.8 compares the predicted and observed times for the typing 
model. Equation 3.2. The zero-parameter prediction is indicated by a 
dotted line, while the two-parameter prediction is indicated by a solid 
line. When the actual typing rates of the subjects are used in the 
equation, the fit to the data is excellent. Using the Taylor approximation 
of tlie model. Equation 3.9, we can assign blame for the sources of error 
in the zero-parameter prediction. These errors are tabulated for each 
text in Table 3.4. On the average, the prediction was about 12% too low. 
Almost all of this error (10%) resulted from the error in correctly 
estimating the parameters; only 2% resulted from lack of fit between the 
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TABLE 3.5 



Comparison of WYLBUR Model 



T. = 



T„ + (>L7'„ WITH Data 







SOURCES OF ERROR 






PARAMETERS 


MODEL 


TOTAL 


TEXT 


T 
se 


T^u 


SubTotal 




Tl 
T2 
T3 
T4 
T5 


-67% 
-51% 
-47% 
-35% 
-13% 


+4% 

+9% 

+11% 

+14% 

+22% 


-63% 
-42% 
-36% 
-21% 
+9% 


+46% 
+17% 
-17% 
-13% 
+1% 


-47% 
-33% 
-46% 
-32% 
+10% 


Ave. 


-43% 


+12% 


-31% 


+7% 


-24% 



model and the data. Although the estimate for the T^^ parameter was 
much worse than the estimate for T^, T^ was the source for twice as 
much error as was T^^ (19% to 9%). Since the errors were in opposite 
directions, they partially offset each other. 



Editing Model 

The editing model of Equation 3.3 is compared with the observed 
times in Figure 3.9. Again there is a good fit between the observed and 
predicted editing times. Analyzing the fit in terms of the Taylor 
approximation (see Table 3.5), the model was about 24% too low. Again, 
errors in estimating the input parameters were responsible for 
considerably more error (31%) than was lack of fit to the model (19%). 
This time the major source of errors in estimating the parameters was 
from underestimating the setup time of the editor. It is instructive to 
note the frequency with which the various sources of errors partially 
cancel each other. 
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Figure 3.9. Fit of editing model to experimental data 
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3.5 CONCLUSIONS 

The point of the forgoing exercise was to explore as a sort of case 
study how much insight could be gained from simple models. There 
were two main results. 

First, with the simplest imaginable model it was possible to produce 
several predictions leading to practical insight. A formula for the length 
crossover point L^ was produced showing its functional dependence on 
other associated variables. A related concept of a density crossover point 
/)^. was identified and expressed in functional form. It was possible to 
predict some unfortunate consequences of an unlucky choice in 
modification density for the experiment. In fact, without the insight of 
this derivation, the results of the experiment would have been difficult to 
interpret at all. 

Second, the major errors in the predictions made by these simple 
models did not result because they were too simple, but because of errors 
in the values of the input parameters. For these predictions, a more 
sophisticated model would have been useful only to the extent it allowed 
one to escape the dependence on such uncertainly determined input 
parameters. 

The sensitivity analysis identified those parts of the prediction in 
which little confidence could be placed. It also allowed credit and blame 
to be assigned after the data were in. 
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NOTES 

^This number is slightly different from the numbers listed in Table 
2.2, since those numbers reflect a later re-analysis of the video tapes. In 
order to preserve the original predictions, the originally measured 
parametric estimate for T^ is used in this chapter. 
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In the previous chapters we have analyzed tasks with a constant 
time/modification model. hi this chapter we will break up the 
modification cycle into smaller pieces to refine our understanding of the 
process. 

Ilie questions we now address are ( 1 ) Can the behavior of a user in 
a manuscript editing task be described as the combination and 
recombination of a small number of elementary behavioral acts, much as 
many molecules come from few atoms? (2) If so, can we predict the 
stream of these acts from an analysis of .the editing task environment? 
Can we measure the time required by the elementary ads and use them 

' Ail;iptc(l Ironi ('aid. Moniii. iiiul Newell (19/()) 
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to predict the time to do an editing task? Finally, (3) what grain size is 
appropriate for choosing the acts? Should they be at the level of a single 
modification in the text? or at the level of an individual motor 
movement? 

This chapter considers a family of information-processing models. 
Each breaks the modification cycle at a different grain size. Their 
predictions are compared in detail with the behavior of a user 
employing the POET editor on the manuscript editing task. 



4.1 THE GOMS MODEL 

Previous chapters said nothing about the mechanism in the user 
which allow him to perform -the editing task. The present chapter 
proposes, as a theory of the user, that his performance can be described 
by a set of Goals, a set of Operators, a set of Methods for achieving the 
goals, and a set of Selection Rules for choosing among a goal's 
competing methods. For short, we shall call a model specified by such 
components a GOMS model. 

Example: Model M4B 

As an example of the basic concepts of a GOMS model and the 
notation used, let us consider a particular model, called M4B, of the 
manuscript-editing task. According to the model, the user begins with 
the top level goal: 

GOAL: EDIT MANUSCRIPT. 

It is a characteristic of manuscript-editing that the larger task of 
editing the manuscript is composed of a collection of small edit tasks, 
called unit tasks, that are almost completely independent of each other. 
Thus, the obvious method for accomplishing the top level goal is to go 
through the individual unit tasks one by one: 

GOAL: EDIT MANUSCRIPT 

GOAL: EDIT UNIT-TASK repeat until no more unit tasks. 

In this paper, we will present the control stiucture of our models by 
displaying them in tlie form of an indented outline (i.e., a tree structure) 
of goals and operators. The outline indicates the order in which the goals 
are set up and the operators evoked. We will also indicate in the outline 
the places where methods must be selected and informally annotate it to 
indicate llie conditionality of tlie various goals and operators within a 
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method. (A method is not always explicitly named in the outline, 
especially when it is the only method for accomplishing a goal; only its 
constituent goals and operators are shown.) Hie indentation above 
indicates that GOAL: EDIT UNIT-TASK is a subgoal of GOAL: EDIT 
MANUSCRIPT and the annotation in italics says that the subgoal is to 
be invoked repeatedly until no more unit tasks remain. 

To edit a unit task, the user must first get the unit task from the 
manuscript and then do what is demanded by the unit task: 

GOAL: EDIT UNIT-TASK 
GOAL: GET UNIT-TASK 
GOAL: DO UNIT-TASK 

Each subgoal will itself evoke appropriate methods. There is a simple 
method for getting a task: 

GOAL: GET UNIT-TASK 

GET -NEXT-PAGE if at end of manuscript page 
GET-UNIT-TASK 

The operator GET-NEXT-PAGE is evoked only if there are no more 
edit instructions on the current page of the manuscript, nie bulk of the 
work towards the goal — looking at the manuscript, finding editing 
instruction, and interpreting the instruction as an edit task — is done by 
the operator GET-UNIT-TASK. 

To do a unit task in POET there is a two-step method: 

GOAL: DO UNIT-TASK 
GOAL: LOCATE LINE 
GOAL: MODIFY TEXT 

Tlie POET editor must first be located at the line where the correction is 
to be made, and then the appropriate text on that line is modified, 
lo l{x:ate POET at a line, tliere is a choice of two methods: 

GOAL: LOCATE LINE 

[select: USE-LF-METHOD 
USE-QS-METHOD] 

lo use the LF- METHOD, the linefeed key is pressed repeatedly, 
causing tiie editor to advance one line forward each time. To use the 
QS-METHOD, a string in quotation marks is typed which identifies the 
line. Usually the LF-METHOD is selected when the new unit task is 
within a few lines of the previous unit task, and the QS-METHOD is 
selected when the new unit task is farther away. 



56 CHAPTER 4 

Once the line has been located, there is a choice of how to modify 
tlie text: 

GOAL: MODIFY TEXT 
[select: USE-S-CMD 

USE-M-CMD] 
VERIFY-EDIT 

That is, either POET's Substitute command or Modify command can be 

used to alter text on a line, but in either case a VERIFY-EDIT operation 

is evoked to check what actually happened against the user's intentions. 

Putting all the pieces together into one tree structure, we have: 

GOAL: EDIT MANUSCRIPT 

GOAL: EDIT UNIT-TASK repea/ until no more unit tasks 
GOAL: GET UNIT-TASK 

GET-NEXT- PAGE if at end of manuscript page 
GET-UNIT-TASK 
GOAL: DO UNIT-TASK 
GOAL: LOCATE LINE 
. [select: USE-QS-METHOD 
USE-LF-METHOD] 
GOAL: MODIFY TEXT) 
. [select: USE-S-CMD 
USE-M-CMD] 
. . . . VERIFY-EDIT 

The dots at the left of each line show the depth of the goal stack. 

To complete this model of manuscript-editing, we must add method 
selection rules that would determine the actual sub-methods at the two 
occurrences of "select". Due to the simplicity of the structure of the 
methods, we have used a simplified programming notation with the 
conditions written off to tlie right as notes. These conditions are not 
merely comments on the model, but are an integral part of the model, as 
are the selection rules. 

The step by step behavior of tlie model in performing a unit task is 
traced in Table 4.1. The user is imagined to have a goal stack with the 
current goal being on top of the stack. New subgoals are pushed onto 
the stack and completed goals (whether satisfied or abandoned) are 
popped off the stack. The goals eventually cause operators to be 
executed. It is during the execution of tlie operators Uiat interactions 
with the physical world task place, llie user executes the operator GET- 
UNIT-TASK by turning to tlie manuscript, scanning it until he finds the 
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TABLE 4.1 

Trace of Model M4B During Performance of a Unit Task 
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Step 


Contents of Goals Stack^ 


Operator Executed 


User Action 


1 


(EDIT MS)^ 






2 


(EDIT MS) (EDIT UT) 






3 


(EDIT MS) (GET UT) (EDIT UT) 






4 


(EDIT MS) (GET UT) (EDIT UT) 


GET-UT 


(Looks at manuscript) 


5 


(EDIT MS) (GET UT) 






6 


(EDIT MS) (GET UT) (DO UT) 






7 


(EDIT MS) (GET UT) (DO UT) (LOCATE LINE) 






8 


(EDIT MS) (GET UT) (DO UT) (LOCATE LINE) 


USE-LF-METHOD 


(Types <LF>; 


9 


(EDIT MS) (GET UT) (DO UT) 






10 


(EDIT MS) (GET UT) (DO UT) (MODIFY-TEXT) 






11 


(EDIT MS) (GET UT) (DO UT) (MODIFY-TEXT) 


USE-S-CMD 


(Types sIcli<CR>idi<CR><CR>; 


12 


(EDIT MS) (GET UT) (DO UT) (MODIFY-TEXT) 


VERIFY-EDIT 


(Types /) 


13 


(EDIT MS) (GET UT) (DO UT) ■ 






14 


(EDIT MS) (GET UT) 






15 


(EDIT MS) 







^Top of stack, that is, the current goal, is at the right 

^o save space, the word GOAL: has been dropped from the beginning of goal 
expressions, MANUSCRIPT has been abbreviated MS, and UNIT-TASK has been abbreviated 
UT 
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next task, reading the instructions, and turning back to the temiinal. The 
user executes the operator USE-S-CMD for the second task in Plate 1.2 
by typing sIdi<CR>idi<CR><CR> as described in tlie previous section. 

Components of the GO MS Model 

Goals. A goal is a symbolic structure that defines a state of affairs to 
be achieved and determines a set of possible methods by which it may 
be accomplished. In the example, the goals are GOAL: EDIT 
MANUSCRIPT, GOAL: EDIT UNIT-TASK, GOAL: GET UNIT- 
TASK, GOAL: DO UNIT-TASK, GOAL: LOCATE LINE, and GOAL: 
MODIFY TEXT. The dynamic function of a goal is to provide a 
memory point to which the system can return on failure or error and 
from which information can be obtained about what is desired, what 
methods are available, and what has been tried already. 

Operators. Operators are elementary motor or information-processing 
acts, whose execution is necessary to change any aspect of the user's 
memory or to affect the task environment. In the example, the operators 
are: GET-NEXT-PAGE, GET-UNIT-TASK, USE-QS-METHOD, USE- 
LF-METHOD, USE-S-CMD, USE-M-CMD, and VERIFY-EDIT. The 
behavior of the user is ultimately recordable as a sequence of these 
operations. In the example traced in Table 4.1, the sequence of behavior 
is GET-UNIT-TASK, USE-LF-METHOD, USE-S-CMD, VERIFY- 
EDIT, The model does not deal with any fine structure of concurrent 
operation. 

An operator is defined by a specific effect (output) and by a specific 
duration. The operator may take inputs, and its outputs and duration 
may be a ftinction of its inputs. An obvious example is a typing 
operator, whose input is the text to be typed, whose output is the 
keystroke sequence to the keyboard, and whose duration is 
(approximately) a linear function of the number of characters. 

For a specific model the operators define the grain of analysis. In 
general, they embody an indeterminate mixture of basic psychological 
mechanisms and learned organized behavior, the mixture depending on 
the level at which tlie model is cast. The finer the grain of analysis, the 
more the operators reflect basic psychological mechanisms. The coarser 
the grain of analysis, the more the operators reflect the specifics of the 
task environment, such as the terminal, the physical arrangement, and the 
editor. 

Methods. A method describes a procedure for accomplishing a goal. 
Tlie description of the procedure is cast as a conditional sequence of 
goals and operators, with conditional tests on the contents of the user's 
immediate memory and on the state of Uie task environment. In the 
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example above, one of the methods was 

GOAL: GET UNIT-TASK 

GET-NEXT- PAGE if at end of manuscript page 
GET-UNIT-TASK 

'ITiis method is associated with the goal GOAL: GET UNIT-TASK. It 
will give rise to either the operator sequence GET-NEXT- PAGE, GET- 
UNIT-TASK or the operator sequence GET- UNIT-TASK depending on 
whether the test at end of manuscript page is true of the task 
environment at the time the test is performed. 

In the manuscript-editing task, the methods are sure of success, up to 
the possibility of having been mis-selected, the occurrence of errors of 
implementation, and the reliability of the equipment. By contrast, in 
problem solving tasks, such as the task faced by a novice in the Tower of 
Hanoi puzzle, methods have a chance of success distinctly less than 
certainty, due to the user's lack of knowledge or appreciation of the task 
environment. This uncertainty is a prime contributor to the problem 
solving character of a task; its absence is a characteristic of a routine 
cognitive skill. 

Methods are learned procedures which the user has at performance 
time, ^rhey are not plans that are created during a task performance. 
They constitute one of the major ways in which familiarity (skill) 
expresses itself The particular methods that the user builds up from 
prior experience, analysis, and instruction reflect the detailed structure of 
tlie task environment. In the manuscript-editing task, they reflect 
knowledge of the exact sequence of steps required by the editor to 
accomplish specific tasks. 

Control Structure: Selection Rules. When a goal is attempted, there 
may be more than one method available to the user to accomplish the 
goal. The selection of which method is to be used need not be an 
extended decision process, for it may be that the task environment 
features dictate that only one method is appropriate. On the other hand, 
a genuine decision may be required. The essence of skilled behavior is 
that these selections are hot problematical, that they proceed smoothly 
and quickly and without the eruption of puzzlement and search 
characteristic of problem solving behavior. 

In the GOMS model, method selection is handled by a set of 
selection rules. Kach selection rule is of the fonn "if such and such is 
true in the current task situation, then use method M". Selection rules 
for the LOCATE goal of the example model might read: if the number 
of lines to the next fuodification is less than S. use the LF-METHOD; 
else use the QS-METHOD. Such rules allow us to predict from 
knowledge of the task enviromncnl, in Ihis case the number of lines to 
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the target, which of several possible methods will be selected by the user 
in a particular instance. 

Limitations of the GOMS Model 

For error- free behavior, the GOMS model provides a complete 
dynamic description of behavior, measured at the level of goals, methods, 
and operators. Given a specific task (i.e., a specific manuscript), this 
description can be expanded into a sequence of operations (operator 
executions). By associating times with each operator, such a model will 
make total time predictions. If these times are given as distributions, it 
will make statistical predictions. But, without augmentation, the model 
will not make predictions if errors occur. Yet, errors exist in routine 
cognitive skilled behavior. Indeed, error rates may not even be small, in 
the sense of having negligible frequency, taking negligible tirhe, or 
having negligible consequences. What is true of skilled behavior is that 
the detection and correction of errors is mostly routine. It cannot be 
entirely routine, since rare types of errors for which the user is 
unprepared are always possible (e.g., the terminal catching fire, the editor 
performing incorrectly, etc.). But, in the main, errors are quickly 
detected and converted to the additional time to correct the error. The 
final result of the behavior remains relatively error free and can be 
characterized solely by the time to completion. Thus, errors can be 
converted to variance in operator times, so that the GOMS theory can be 
applied to actual behavior at the price of degraded accuracy. 

To cover the full range of human behavior, the general theory of the 
human information processing systems requires a more flexible control 
structure, such as a production system (Newell & Simon, 1972; Newell, 
1973). In fact, some of the models to be described have been expressed 
as production systems and executed as computer simulations. Since in 
this chapter the analysis will not be carried deeply into the behavior of 
errors, the GOMS model contains an adequate specification of control. 

Model Grain 

The example model displayed above is not the only possible GOMS 
model for the manuscript editing task. Models could be given with 
either more or less detail. Thus, there is an important issue of the 
appropriate grain of the analysis. 

A priori, it is not possible to know which grain size is appropriate. 
As the grain of the analysis becomes finer, the model successively 
accumulates opportunities for conditional behavior (either optional 
application of some method or differentiation into cases). Tlius, from 
one point of view, models at finer grain should be more accurate. But at 
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a finer grain the low-level operators are combined to form functional 
units that a coarser grain analysis would reflect directly. It has been 
known for some time (Abruzzi, 1956; Smith & Harris, 1954), that the 
time required for an operator in a sequence of operators, especially one 
in a fine grain analysis, may depend on other operators in the sequence. 
Furthermore, there is typically greater error in the measurement of finer 
grain operators than of coarser grain operators. I1:ius, a finer grain 
analysis might actually be less accurate. 

In order to see how the grain of analysis affects the accuracy of the 
models, we will redo the analysis at several levels of detail, comparing 
the resulting models, lliere appear to be two essentially independent 
dimensions along which the grain of analysis can be made finer or 
coarser. Tlie primary dimension involves the duration of the operators. 
Given that human primitive operators are something less than a tenth of 
a second, many levels of time aggregation are possible. The second 
dimension involves the amount of differentiation between operators, i.e., 
the degree to which conditionality is suppressed and alternative operators 
(or sequences of operators) are considered to be the same operator. Such 
case-analysis aggregation can happen at any level of time aggregation. 

We will explore variations of GOMS models along both of these 
dimensions. Table 4.2 describes the family of nine manuscript-editing 
models we will consider, and Tables 4.3 to 4.5 lay out the models 
themselves. Each model is given a name of the form Mid. The / 
indicates the level, i.e., the order of magnitude time grain of the model in 
seconds. F'or convenience, we consider models which increase roughly in 
powers of 2 sec. Thus, M16A has operators whose durations are on the 
order of 2^ seconds; M0.5A has operators closest to 2'^ sec. This latter is 
the measurement limit in the experimental arrangement. 

Within each level we consider various degrees of differentiation. 
Tliere is no convenient metric here, nor do differentiations at the 
different levels correspond, so we simply assign arbitrary letter 
labels — the d in the model name. However, we do adopt the convention 
that model M/A is the most aggregated model at level /, i.e., the one that 
collapses conditional sequences into each other as much as possible. 

At the most aggregated level. Model M16A, shown in Table 4.3, 
consists of a single operator, EDIT-UNIT-TASK. The goal of 
manuscript-editing is accomplished by repeating this operator for each 
unit task. With a single operator, M16A always predicts that it takes the 
same amount of lime to do a unit task, hence the same amount of time 
to do a total Job of // unit tasks. Level 4 models come from 
decomposing the unit task into its invariant functional cycle: (1) get the 
next edit liisk, (2) locate the editor al the line on which the correction is 
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TABLE 4.2 
Description of GOMS Models Tested 



Level 16 Models 

M16A Constant time per unit task. Only one operator: EDIT-UNIT-TASK. 

Level 4 Models 

M4A Single operator for each functional step in unit task sequence: GET-UNIT-TASK, 
LOCATE-LINE, MODIFY-TEXT, VERIFY-EDIT. 

M4B Like M4A but with operators LOCATE-LINE and MODIFY-TEXT broken into 
separate cases based on methods used to accomplish them. 

Level 2 Models 

M2A Like M4B, but with operators at the level of typing a system command (SPECIFY- 
CMD) or typing an argument to a command (SPECIFY -ARG). 

M2B Like M2A but with SPECIFY-CMD and SPECIFY-ARG broken into separate cases 
according to whether they involve an implicit need for use to get information from 
manuscript (suffix /G) or not (suffix /NG). 

M2C Like M2A but with SPECIFY-CMD and SPECIFY-ARG broken into separate pases 
according to four method contexts: quoted string method (/Q), first argument to 
substitute command (/S1 ), second argument to substitute command (/S2), or modify 
command (/M). 

M2D Like M2A, but with all the distinctions in both M2B and M2C combined 
multiplicatively. 

Level 0.5 Models 

M0.5A Like M2B, but with operators at the level of basic perceptual, cognitive, and motor 
actions: LOOK-AT, HOME, TURN-PAGE, TYPE, and MOVE-HAND. AH mental 
actions not overlapped with motor operations represented as MENTAL operator. 

M0.5E Like M0.5A, but with MENTAL broken down into SEARCH-FOR, COMPARE, 
CHOOSE-CMD, and CHOOSE-ARG. 

M0.5E' Like M0.5E, but typing time is a constant (i.e., not parameterized by the number of 
keystrokes). 
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TABLE 4.3 
Level 16 and Level 4 Models 
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Model M16A: 

(goal: edit manuscript) 
(edit-unit-task) 



repeat until no more unit tasks 



Model M4A: 



(GOAL: EDIT MANUSCRIPT) 
(GOAL: EDIT UNIT-TASK) 

(GOAL: GET UNIT-TASK) 
(GET-NEXT-PAGE) 
(GET-UNIT-TASK) 
(GOAL: DO UNIT-TASK) 
(LOCATE-LINE) 
(MODIFY-TEXT) 
(VERIFY-EDIT) 



repeat until no more unit tasks 

. if task not remembered 

. . if at end of manuscript page 

. if an edit task was found 
. . if task not on current line 



Model M4B: 



(GOAL: EDIT MANUSCRIPT) 
(GOAL: EDIT UNITTASK) 

(GOAL: GET UNIT-TASK) 
(GET-NEXT- PAGE) 
(GET-UNIT-TASK) 
(GOAL: DO UNIT-TASK) 
(GOAL: LOCATE LINE) 

[select (USE-QS-METHOD) 
(USE-LF-METHOD)] 
(GOAL: MODIFY TEXT) 
[select (USE-S-CMD) 
(USE-M-CMD)] 
(VERIFY-EDIT) 



repeat until no more unit tasks 

. if task not remembered 

. . if at end of manuscript page 

. if an edit task was found 
. . if task not on current line 
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to be made, (3) modify the line of text, and (4) verify that the edit was 
done correctly. Level 2 models arise by decomposing the methods used 
at Level 4 into the individual steps of specifying commands and 
arguments (see Table 4.4). 

Both Levels 4 and 2 are driven by tlie structure of tlie POET 
commands. These are themselves reflections of the demands of the task 
as it is defined in the manuscript. At Level 0.5 an entirely different set of 
operators (see Table 4.5) comes into view which are not defined 
functionally by their role in a command language, but are defined by 
reference to the basic physical and mental actions of the user — typing, 
looking, moving a hand, and various mental operations. These operators, 
unlike the operators at other levels, are task free. 

The cost of obtaining the estimates of aH the different operators and 
selection rules increases as the size of the operators decrease, because 
more data is required for a given level of robustness and because the 
observation and measurement problems increase at the lower levels. A 
possible compensation for the greater cost of using the Level 0.5 
operators is that, unlike the larger operators, it may not be necessary to 
determine lower level operators for each experimental task. 



4.2 EXPERIMENTAL METHOD 

In order to evaluate the usefulness of the GOMS model for 
describing user behavior in the manuscript editing task, an experiment 
was conducted in which detailed observations were made of users editing 
a prescribed manuscript. Although this is a laboratory setting, an effort 
was made to make the situation was naturalistic from the users' point of 
view: the physical surroundings, the task, the terminal, and the editor 
were familiar as part of the users' daily activities. The manuscript and 
the modifications to be made on it were selected to be typical. 

Subjects. Subjects were four employees of the Xerox Palo Alto 
Research Center. Three (S4, S13, and S22) were or had been 
professional secretaries, one (SI) was a programmer. All had at least a 
year of daily experience using the editor. 

Manuscript. The manuscript was an 11 page memo. Each page was 
8-1/2 by 11 inches, with 55 lines of text and 70 characters per line, 
printed unjustified in a 10-point fixed-pitch font.- There were 73 
different modifications marked with a red pen, giving an average density 
of one modification every 8.3 lines, or 6.6 modifications per page (from 3 
to 11 on any one page). An effort was made to vary the number of lines 
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TAliLE 4.4 
Level 2 Models 



Model M2A: 

(GOAL: EDIT MS) 

(GOAL: EDIT UT) 

(GOAL: GET UT) 

(GET-NEXT-PAGE) 
(GET-FROM-MS) 
(GOAL: DO UT) 

(GOAL: LOCATE LINE) 

[select (GOAL: USE OS-METHOD) 
(SPECIFY-CMD) 
(SPECIFY-ARG) 
(GOAL: USE LF-METHOD) 
(SPECIFY-CMD)] 
(VERIFY-LOC) 
(GOAL: MODIFY TEXT) 

[select (GOAL: USE S-CMD) 
(SPECIFY-CMD) 
(SPECIFY-ARG) 
(SPECIFY-ARG) 
(GOAL: USE M-CMD) 
(SPECIFY-CMD) 
(SPECIFY-CMD) 
(SPECIFY-ARG) 
(SPECIFY-CMD)] 
(VERIFY-EDIT) 



repeat until no more unit tasks 

. if task not remembered 

. . if at end of manuscript page 

if an edit task was found 
. . if task not on current line 



repeat until at line 



repeat until at text 



Model M2B: 

SPECIFY-CMD 
SPECIFY-ARG 



> SPECIFY-CMD/G, SPECIFY-CMD/NG 

> SPECIFY-ARG/G, SPECIFY-ARG/NG 



Model M2C: 

SPECIFY-ARG 



> SPECIFY-ARG/Q, SPECIFY-ARG/M, 

> SPECIFY-ARG/S1, SPECIFY-ARG/S2 



Model M2D: 

SPECIFY-CMD 
SPECIFY-ARG 



= > SPECIFY-CMD/G, SPECIFY-CMD/NG 

= > SPECIFY-ARG/Q/G, SPECIFY-ARG/Q/NG, 
= > SPECIFY-ARG/M/G, SPECIFY-ARG/M/NG, 
= > SPECIFY-ARG/S1/G, SPECIFY-ARG/S1 /NG, 
= > SPECIFY-ARG/S2/G, SPECIFY-ARG/S2/NG 



66 



CHAPTER 4 



TABLE 4.5 
Model M0.5E 



(GOAL: EDIT MS) 

(GOAL: EDIT UT) 
(GOAL: GET UT) 

GOAL: TURN PAGE)* 
GOAL: GET-FROM MS)* 
(GOAL: DO UT) 

GOAL: LOCATE LINE) 
(CHOOSE-CMD) 
[select (GOAL: USE QS-METHOD) 

(GOAL: SPECIFY-CMD)* 
(GOAL: SPECIFY ARG)* 
(GOAL: USE LF-METHOD) 
(GOAL: SPECIFY-CMD)* 
(GOAL: VERIFY LOCATION)* 
GOAL: MODIFY TEXT) 
(CHOOSE-CMD) 
[select (GOAL: USE S-CMD) 

(GOAL: SPECIFY CMD)* 
(GOAL: SPECIFY ARG)* 
(GOAL: SPECIFY ARG)* 
(GOAL: USE M-CMD) 

(GOAL: SPECIFY CMD)* 
(GOAL: SPECIFY CMD)* 
(GOAL: SPECIFY ARG)* 
(GOAL: SPECIFY CMD)*] 
(GOAL: VERIFY EDIT)* 
*(GOAL: TURN PAGE) 
(LOOK-AT MS) 
(ACTIVE) 
(MOVE-HAND) 
(TURN-PAGE) 
(GOAL: GET-FROM MS) 
(LOOK-AT MS) 
(SEARCH-FOR) 
(LOOK-AT D) 
(GOAL: SPECIFY CMD) 

(GOAL: GET-FROM MS)* 
(CHOOSE-CMD) 
(GOAL: TYPE STRING)* 
(GOAL: SPECIFY ARG) 

(GOAL: GET-FROM MS)* 
(CHOOSE- ARG) 
(GOAL: TYPE STRING)* 
(GOAL: VERIFY) 
(LOOK-AT D) 
(GOAL: GET-FROM MS)* 
(COMPARE) 
(GOAL: TYPE STRING) 
(HOME) 
(LOOK-AT K) 
(LOOK-AT D) 
(TYPE STRING) 



repeat until no more unit tasks 

. if task not remembered 

. . if at end of manuscript page 

. if an edit task was found 
. . if task not on current line 



repeat until at line 



repeat until at text 



repeat twice 
repeat twice 



optional 

if not all 

if not already selected 

optional 



if not already selected 
lay 



optional 



optional 
optional 
optional 
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between consecutive modifications and to place an equal number of 
modifications in each of the left, right, and middle portions of the page. 
ITie marked modifications were relatively short: four of them were 
deletions (of an average of 5.5 characters), 26 were insertions (of an 
average of 2.9 characters), and 40 were replacements (of an average of 
4.1 characters by 4.4 characters). The paragraph of Figure 1.2 was taken 
from the manuscript and illustrates the style in which modifications were 
indicated to the user. 

Terminal. Two terminals were used in the experiment: a Texas 
Instruments (TI) "Silent 700" (prints on paper at 30 characters/sec) and a 
CRl' display 8-1/2 inches wide by 10-3/4 inches high (42 Unes, with 72 
characters per line). Text was displayed on the CRT at a maximum rate 
of 6 lines per sec. The display was programmed to operate according to 
a simple scrolling discipline (the same discipline used on the hardcopy 
terminal): each new line was displayed at the bottom of the screen with 
the other lines scrolling up to make room, i.e., the last 42 lines of an 
interaction were visible on the screen. 

Measurement apparatus. The terminal was connected to a large 
computer running the POET editor under the TEN EX time-sharing 
system. For this experiment the terminal was modified to time- stamp 
and record on a data file all input and output events. It should be noted 
that the accuracy of the timing of events did not depend on the response 
of the time-sharing system. Accuracy of time-stamping was to within 32 
msec of the actual time of the event at the terminal. The average 
response time of the editor to commands during the experiment was 0.8 
sec (SD = 0.6 sec). 

Two television cameras were focussed on the user, one camera giving 
an overall view of the situation, the other closely focussed on the user's 
face from which it could be determined whether he was looking at the 
manuscript, the keyboard, or the CRT. Pictures from the cameras were 
electronically combined to form a single split image recorded on 
videotape. The user wore a lapel microphone, recording onto the 
soundtrack of the videotape. A digital clock was electronically mixed 
with tlie video picture, time-stamping each frame. 'ITic times measured 
fiom video frames were accurate to 33 msec (one video frame). 

Procedure. ITie user was seated before the terminal with the 
manuscript to his left. He first performed a short editing task on another 
manuscript for wannup and to insure that he understood what to do. In 
the first two sessions run, the users were instructed to proceed through 
the manuscript inserting an asterisk at the beginning of the line, since 
these iwo were run to iiivesligale only methods for locating the target 
line. lM)r the other three sessions, the users were instructed to edit the 
11 page manuscript, which took approximately 20 minutes. 
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4.3 RESULTS 



4.31 Operator Sequences 

Is it possible to predict the actual sequence of operators a person will 
use to do the task? In order to predict operator sequences, it is necessary 
to be able to predict method selections. Consequently, the method 
selections of the users were investigated. To get an indication how well 
the models, in concert with the selection rules, could predict operator 
sequences, one user's protocol was singled out for intensive analysis. 
Predictions were made of the operator sequence she would use for each 
of the tasks in the manuscript. The sequence of operators the user 
actually employed was determined and the two compared. The 
comparison was repeated for each model. 

Method Selection Rules 

As we saw in the description of model M4B and as indicated by the 
"select" in Table 4.3, there are two places where, for a given goal, the 
user has a choice of methods. The first method selection comes in 
deciding how to "locate the line," that is, how to make the Current Line 
of the editing system be the line containing the text to be modified (the 
LOCATE goal). The second method selection comes in choosing 
between commands for making the text modification (the MODIFY 
goal). The users' behavior in the five experimental sessions was 
examined to identify the methods they employed in the service of these 
goals. Table 4.6 gives the methods observed and the frequencies with 
which the methods were selected. OS-METHOD and LF-METHOD are 
the methods previously described for the LOCATE goal. S-CMD and 
M-CMD are the methods previously described for the MODIFY goal 
The others are additional methods that were used less frequently and 
may be described as follows: 

+ N- METHOD. The user estimates the number of lines, n, 
to the next unit task then types the command + n/, which causes 
POET to advance n lines and print the line. It is assumed that 
a correction may be needed, e.g., the user may have to type a 
few linefeeds (each of which move's him down a line), t's (each 
of which moves him up a line), or may even have to repeat the 
•f n/ command with a new n. 
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TABLE 4.6 










Freouhnct of Method Selections 










Experimental Session 








1 


2 


3 


4 


5 


User 
1'emiinal 


SP 
TI 


S4 
Tl 


S4 
CRT 


S22 
CRT 


S13 
CRT 


LOCATE Methods: 
QS-METHOD 
LF-METHOD 
+ N-METHOD 
AN-METHOD 


44 (65%) 
11 (16%) 
2 ( 3%) 
11 (16%) 


1 ( 2%) 
14 (21%) 
51 (77%) 
( 0%) 


( 0%) 
45 (68%) 
20(30%) 

1 ( 2%) 


40 (62%) 
25 (38%) 
( 0%) 
( 0%) 


46 (68%) 
21 (31%) 

( 0%) 

1 ( 1%) 


MODIFY Methods: 












S-CMD 
M-CMD 
C-CMD 


b 
b 
b 


48 (73%) 
18 (27%) 
( 0%) 


b 
b 
b 


57 (86%) 
9 (14%) 
( 0%) 


63 (93%) 
4 ( 6%) 
1 ( 1%) 



^Sl was the only subject who was a programmer. 
^No MODIFY method data was collected. 
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AN-METHOD. The user selects an easily specified "anchor" 
line near the target line, e.g., a blank line (specified by the 
empty string ""), the last line of a page (which has the special 
symbol $), or a line that has a short unique string, such as a 
paragraph number. Then the target line is reached by using 
linefeeds or t's. For example, the command ""linefeed locates 
POET at the first line of the next paragraph. 

A striking feature of the numbers in Table 4.6 is each user clearly has 
a default method. By knowing only the default method of the user, his 
method selection can be predicted correctly about 68% of the time for 
the LOCATE goal and 84% of the time for the MODIFY goal. 
Apparently, the user will use this default method unless it is obviously 
inefficient (as in linefeeding a line at a time through ten pages of text to 
get the next task). 

By taking into account other features of the task environment, the 
prediction of which method the user will select can be improved. The 
most important characteristic of the task environment to consider for 
GOAL: LOCATE methods is the number of lines D between the 
Current Line and the line with the text to be modified next. As is clear 
from Table 4.7, all users used the LF-METHOD if the next hne was 
close enough. Where the users differed was in the threshold for how far 
away the target had to be before they shifted to other methods. The 
time required to use the LF-METHOD is sensitive to the speed of the 
terminal. (Each time the user types linefeed, the system prints out the 
new Current Line). It is not surprising, therefore, that the threshhold for 
when to abandon the LF-METHOD is lower when the user is using a 
slow terminal than when he is using a fast one. For the slow 30 char/sec 
TI terminal, both users shifted at Z) = 3 lines. For the faster display 
terminals, two users shifted at Z) = 5 lines. The other user (for whom 
the LF-METHOD was the default) held out until D = 10 lines. 

The complete prediction of which method each user will employ for 
the LOCATE goal is organized as a set of Selection Rules in Table 4.8. 
Each row gives the results of the accumulation of rules Rl to Rn adding 
rules one at a time, llie Hits column shows the total number of cases 
correctly predicted. Misses shows the number of cases in which the 
prediction was wrong (Hits + Misses = the total number of method 
selections). As each rule is added, the set of rules taken together predicts 
more cases correctly, but a few individual cases which were predicted 
correctly may now be missed. For example, adding rule R2 in the 
second line of the table correctly predicts 11 method selections of the 24 
that had been missed using Rule Rl alone, but at the cost of missing 2 
of the 44 that were previously hits — a net gain of 9. As the table shows. 
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TABLE 4.7 




















Frf.ouf.ncy of Locat: 


F Methods 


BYdY 








User ^ 


Method 




Number of lines from current line to line 


containing target (/)) 






1 


2 




3 


4 


5 


6 




8 


9 


10-14 


15 + 


SI 


LF 


8 


3 


P 




















(TI) 


QS 




2 
1 


1 
1 


4 
1 


5 


2 






3 


4 


8 


15 




AN 






1 


1 


2 








1 


1 


3 


3 


S4 


LF 


8 


4 


1 


1 




1 














(TI) 


QS 






1 


















1 




1-N 




1 


1 


5 


5 


3 






4 


4 


11 


17 




AN 






1 




















S4 


LF 


6 


7 




6 


5 


3 






3 


2 


1 2 


10 


(CRT) 


QS 
+ N 
AN 












1 






1 


2 


1 
1 9 

1 


7 

1 


S22 


LF 


6 


5 




6 


5 


1 


1 






1 






(CRT) 


QS 
+ N 
AN 




1 






1 


2 


1 

1 
1 




4 


4 


10 


18 


S13 


LF 


8 


4 




4 


3 


1 


1 










1 


(CRT) 


QS 
+ N 
AN 








3 


2 


3 


1 
1 

1 




3 


4 


12 


17 
1 


MS Total^ 


8 


6 




6 


5 


4 







4 


4 


11 


19 



^llie vertical bar indicates where LF-METHOD slops being the preferred method 
^Frequency of /)'s taking the tasks over the whole manuscript in order. Since users usually did 

some edits in a different order, totals for different cxperimenls in the same column are not 

necessarily equal. 
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TABLE 4.8 
Selection Rules for LOCATE Goal 



User 



Rule 



This Rule 

Gain Lose Hits Misses %Hits 



Cumulative 



SI Rl: Use the QS-METHOD unless another rule 
(TI) applies. 

R2: If Z) < 3, use the LF-METHOD 

R3: If the target line is the last line of the 
page, use the AN-METHOD (with $). 

R4: If the current method is to use paragraph 
numbers for search strings and the target 
line is near a paragraph number, then use 
the AN-METHOD. 



44 

11 
5 



44 

53 
58 

60 



24 65% 

15 78% 
10 85% 



S4 Rl: Use the +N -METHOD unless another rule 
(TI) applies. 

R2: If Z) < 3, use the LF-METHOD. 



51 





51 


15 


77% 


12 


1 


62 


4 


94% 



S4 Rl: Use the LF-METHOD unless another rule 
(CRT) applies 

R2: If D > 9, use the + N-METHOD. 

R3: If the target line is on the next page of 
the manuscript, use the LF-METHOD 



45 



45 



21 68% 



16 


12 


49 


17 


74% 


56 


10 


56 


10 


85% 


40 





40 


26 


61% 


22 


2 


60 


6 


91% 


46 





46 


22 


68% 


19 


5 


60 


8 


88% 


4 


2 


62 


6 


91% 



S22 Rl: Use the QS-METHOD unless another rule 
(CRT) applies. 

R2: IfZ) < 5, use the LF-METHOD. 



S13 Rl: Use the QS-METHOD unless another rule 
(CRT) applies. 

R2: If Z) < 5, use the LF-METHOD. 

R3: IfZ) = 3 or 4 and Column > 25, 
then use the QS-METHOD. 
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il is possible to predict the method selection for users an average of 90% 
of the time using from two to four rules. 

Data for Detailed Studies of User SI 3 

In order to compare the operator sequence predicted by the models 
with a sequence actually employed, the recorded behavior of one user 
S13 was subjected to intensive analysis. S13 was a highly skilled female 
secretary (typing rate 103 words per minute) with about two years 
experience on the POET editor, much of it with the type of terminal 
used in this experiment. 

The video-taped record of her behavior and the time-stamped file of 
keystrokes were combined into a protocol, a fragment of which is 
reproduced in Table 4.9. The protocol was coded directly from the 
video-tape and the keystroke file using a set of descriptive operators not 
related a priori to any model. ITie overwhelming bulk of behavior was 
coded by the operators TYPE, LOOK -AT, and MENTAL defined as 
follows: 

TYPEicharl char! ...) A burst of typewriting starting with the 
beginning of the finger trajectory toward the first key 
and ending when the last key makes contact. A 
"burst" is defined as a sequence of keystrokes with no 
more than 300 msec between successive key contacts. 

LOOK-ATCp/acc) Act of looking from one place to another. A 
place is either the CRT, the keyboard, or the 
manuscript LOOK -AT includes the physical head 
movement and gross eye movement, but does not 
include any perceptual scanning within a place (such as 
searching a manuscript page for a new task). 

MENTAL Generic operator for mental activity that does not 



overlap with physical operations 



Other operators, used infrequently, were HOMEihand place) for 
moving a hand to the keyboard preparatory to typing, MOVE- 
HAND(/m/;</ place) for other hand movements, TURN-PAGE, 
ACTION(<A'Mr//;//o/i), and EXPRESSI0N(//<'.v<77/;r/Vi/i). Hie last two 
were miscellaneous categories for recording other behavior. 

The first three unit tasks were discarded before analysis to minimize 
any wannup effect. The remaining 70 unit tasks were partitioned into 
two comparable data sels: a Derivation data set, consisting of the 34 unit 
tasks on the odd-numbered pages, and a Crossvalidalion data set con- 
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TABLE 4.9 
Protocol Record of One Unit Task 



Start 


Stop 


AT 


Operator Description 




{min:sec) 


(min.sec) 


(sec) 






18:56.33 


18:56.73 


0.40 


LOOK-AKMANUSCRIPT) 




18:56.73 


18:58.89 


2.16 


MENTAL 




18:58.89 


18:59.41 


0.52 


homecleft) 




18:59.41 


18:59.66 


0.25 


mental 


{This protocol describes 


18:59.66 


18:59.94 


0.23^ 


LOOK-AT{KEYBOARD) 


the behavior during 


18:59.89 


19:00.14 


0.25 


TYPE{") 


the last unit task 


19:00.14 


19:00.24 


0.10 


MENTAL 


shown in Figure 1] 


19:00.24 


19:00.48 


0.24 


LOOK-AT(CRT) 




19:00.48 


19:01.11 


0.63 


MENTAL 




19:01.11 


19:01.43 


0.32 


LOOK-AT(KEYBOARD) 




19:01.43 


19:01.70 


0.27 


MENTAL 




19:01.70 


19:01.82 


0.12 


TYPE(e) 




19:01.82 


19:01.92 


0.10 


MENTAL 




19:01.92 


19:02.66 


0.07^ 


TYPE(x is <CR> /) 




19:01.99 


19:02.34 


0.35 


LOOK-AT(CRT) 




19:02.34 


19:04.16 


1.82 


MENTAL 




19:04.16 


19:04.53 


0.37 


LOOK-ATCMANUSCRIPT) 


, 


19:04.53 


19:05.48 


0.95 


MENTAL 




19:05.48 


19:05.83 


0.15^ 


LOOK-AT(CRT) 




19:05.63 


19:05.91 


0.28 


TYPEC. s) 




19:06.06 


19:06.40 


0.13^ 


LOOK.AT(KEYBOARD) 




19:06.19 


19:06.50 


0.24 


MENTAL 




19:06.74 


19:06.86 


0.07^ 


TYPE(-) 




19:06.81 


19:07.18 


0.32^ 


LOOK-AT(MANUSCRIPT) 




19:07.13 


19:07.25 


0.12 


TYPE(e) 




19:07.25 


19:07.51 


0.26 


MENTAL 




19:07.51 


19:07.63 


0.12 


TYPE(x) 




19:07.63 


19:09.46 


1.83 


MENTAL 




19:09.46 


19:09.65 


0.19 


TYPE(<CR>) 




19:09.65 


19:09.92 


0.27 


MENTAL 




19:09.92 


19:10.04 


0.12 


TYPE(e) 




19:10.04 


19:10.11 


0.07 


MENTAL 




19:10.11 


19:10.46 


0.00^ 


LOOK-AT{CRT) 




19:10.11 


19:10.72 


0.61 


TYPE{x <CR> <CR> /) 




19:10.72 


19:11.76 


1.04 


MENTAL 





^ Time AT charged to operator is less than the difference between the Start and Stop clock times 
because the operator overlaps with tlie next operator. 
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TABLE 4.10 

Unit Task Times For S13's Protocol 





N 


Mean 


SD 






(sec) 


(sec) 


All Unit Tasks: 








Derivation Data 


36 


13.37 


8.87 


Cross validation Data 


34 


19.46 


22.97 


Total 


70 


16.33 


17.37 


Error Unit Tasks: 








Derivation Data 


10 


16.96 


15.13 


Crossvalidation Data 


15 


29.46 


32.17 


Total 


25 


24.46 


26.99 


Error Unit Tasks 








with Error Time Removed: 








Derivation Data 


10 


10.69 


2.68 


Crossvalidation Data 


15 


13.72 


6.47 


Total 


25 


12.51 


5.42 


Error-Free Unit Tasks: 








Derivation Data 


26 


11.99 


4.55 


Crossvalidation Data 


19 


11.57 


3.62 


Total 


45 


11.81 


4.14 



sisting of the 36 unit tasks on the even-numbered pages. This partition 
allowed basic operator statistics to be computed on the Deviation set 
while preserving the Crossvalidation data set for an attempt at prediction 
in a matched situation where no statistical advantage has been taken of 
chance. Table 4.10 lays out the gross unit task time statistics for these 
two data sets. 

The data were also partitioned into the set of error-free unit tasks and 
the set of error unit tasks, each of the latter containing at least one 
identifiable error. The criterion for identifying an error is that the user 
takes some overt corrective action, i.e., an action that undoes the effect of 
a preceding action. Tlie error lime in an error unit task is the time it 
lakes to pcrfomi the corrective action plus any preceding action that is 
undone by the corrective action. For example, the error lime for a mis- 
typed character is the time to type the bad character plus the lime to 
type the control-A (which erases it). Ilius error time is the time penally 
for error. It can be seen in fable 4.10 that error unit tasks take more 
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time and are more variable than error- free unit tasks. But, if error time 
is removed from the error unit tasks, then the remaining (non-error) time 
in the error unit tasks is comparable to the time for the error- free unit 
tasks (Mann-Whitney f/(10,26) = 99.0, p > 0.10 for the Derivation data 
and f/(15,19) = 142.0, p > 0.10 for the Crossvalidation data). Looking 
at error- free tasks alone, it can also be seen that the set of Derivation 
unit tasks do not differ significantly in time from the Crossvalidation unit 
tasks (£7(19,26) = 180.5, p > 0.05). Thus, the two halves of the data 
are comparable for analysis. 

All of the analyses below^ will use the error- free data. The analysis of 
errors, while partially within the competence of the GOMS model (most 
errors being routine) requires a separate analysis (see Card, Moran, and 
Newell, 1976 for the beginnings of a GOMS analysis of errors). 

Fitting the Models to the Data 

The protocol record for the error-free Derivation unit tasks was coded 
into a sequence of operators from each manuscript editing model. For 
example, the Model M4B coding of the protocol segment in Table 4.10 
is: 

18:56.33-18:59.94 3.61 sec GET- UT 

18:59.94-19:04.16 4.22 sec USE-QS-METHOD 

19:04.16 - 19:10.72 6.56 sec USE-S-COMMAND 

19:10.72 - 19:11.23 0.51 secVERIFY-EDIT 

To encode each operator requires a recognizer that determines 
whether the operator occurs in the data and, if so, what its boundary 
times are. Such recognizers are insensitive to many of the details of what 
happens. An odd MENTAL operator within a SPECIFY-CMD (at 
Level 2), a USE-QS-METHOD (at Level 4), or an EDIT-UT (at Level 
16) is quite consistent and is accepted by the recognizers for these 
operators. Thus, it is possible — and indeed it is the case— hat the higher- 
level models account for all of the descriptive operators in the protocol. 
But these odd descriptive operators (e.g., the odd MENTAL) are not 
without consequence; they may show up as sequence errors and, in 
chronometric analysis, as variance in the higher-level operator times. 

The Level 0.5 models, on the other hand, must map one-to-one onto 
the protocol, since the Level 0.5 operators are at the same level of 
aggregation as the protocol operators. Many of the protocol operators 
(e.g., TYPE) are identical to the Level 0.5 operators and are identified 
directly, whereas other protocol operators (e.g., MENTAL) must be 
relabeled (e.g., as SEARCH-FOR, CHOOSE-CMD, etc., in Model 
M0.5E) to fit the models. ITie possibility then exists that there will be 
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descriptive operators in the protocol that are not accounted for by the 
models. More often the descriptive operator, though a possible operator 
type in the models, may not correspond to any possible operator 
produced by the models at that point. This happens for 78 out of the 
581 operator instances in the protocol. The most significant kind of 
unaccounted-for operators are instances of MENTAL that cannot be 
interpreted as one of the M0.5E operators; these are labelled 
UNKNOWN. Of the unaccounted-for operators, 71 are UNKNOWNS, 6 
are MOVE-HANDs, and 1 is an ACTION. Hie time for all of the 
unaccounted-for operators in the protocol amounts to only 8% of the 
total time and 14% of the operator occurrences. Notice, in this regard, 
that the UNKNOWN operators have an extremely low mean (0.28 sec), 
many of them simply being brief waits between, or perhaps preparation 
for, other operators, though there is no way of assigning these in the 
present models. 

It sometimes happens that two mental operators (e.g., VERIFY -LOG 
and SPECIFY-CMD in M2A) are predicted by the model to occur in 
succession. In these cases there is a problem determining the boundary 
between them, for there is no overt indication in the data. Each operator 
type involved in such cases (e.g., VERIFY -LOG) was compared to 
instances of the operator where the boundaries were observable (i.e., 
instances where it was surrounded by non-mental operators). This 
comparison showed clearly that the operator times of these adjacent 
mental operators are not additive. These cases are treated as "combined 
operators," i.e., as if they were separate operator types; and they are 
given names indicating that they are combined (e.g., V + SG). In all there 
are four different combined operator types, two at Level 2 and two at 
Level 0.5. 

Accuracy of Sequence Predictions 

Each of the models was used to predict the sequence of operators in 
the recoded protocol For method selections, the following rules, 
simplified from Table 4.8, were used: 

Selection rules for LOCATE goal: 

Rl. Use the QS-METHOD as default. 

R2. Use the LF-METHOD if D < 5 lines. 

Selection rules for MODIFY goal: 

Rl. Use the S-CMD as default. 
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In predicting operator sequences it is also necessary to fix the conditions 
under which the "optional" operators in some of the models would be 
fired. These operators mainly center around the question of when to 
invoke extra GET-FROM-MS operations, either implicitly (the /G 
versions of the SPECIFY operators in models M2B and M2D, see Table 
4.4) or explicitly (the GET-FROM goal in M0.5E, see Table 4.5). Since 
the conditions which cause extra GET-FROM-MS operations were not 
clear from the data, each option was decided such that exactly one extra 
GET-FROM-MS was predicted for any unit task. 

How well the predicted sequences fit the observed sequences was 
assessed in two ways. First, the set of frequencies with which each 
operator was predicted to occur was correlated with the set of frequencies 
which were observed in S13's data. Second, the operator sequences for 
each model were transformed into a matrix of transitions between each 
operator type; and the corresponding matrices were correlated cell by 
cell. 

Both correlations are plotted in Plate 4.1. The points for the different 
models are joined by lines indicating which models can be derived from 
which other models by refinement (see Table 4.2). The basic result is 
that the structural accuracy is extremely high (r > 0.99) on both 
measures for all the models except M2B, M2D, and M0.5E (r between 
0.62 and 0.90). lliese latter models are the ones which required 
predictions on when to do GET-FROM-MS operations. Overall, the 
correlation between predicted and observed operator frequencies range 
from .76 to 1.00; between predicted and observed transition frequencies, 
.62 to 1.00. The method selection rules were 94% accurate overall (one 
LOCATE method and one MODIFY method were wrongly predicted in 
the Derivation data and three Locate Methods were wrongly predicted in 
the Crossvalidation data). However, method selection errors affect only a 
small percentage of the operators in the operator sequence, as can be 
seen in the high correlations of models M4B, M2A, and M2C. 



. 4.32 Task Times 

How well can we predict the time it will take to edit the manuscript? 
The recodings of S13's protocol contain times from which it is possible to 
compute chronometric statistics for each operator in each model. 
Estimates of the time to perform a specific unit task were computed in 
two ways. (1) Given the observed sequence of operators, sum the mean 
times for each operator in the sequence. This estimate, which we shall 
call a "reproduction" of the data, gives us an upper bound on how well 
the models do. (2) Use the sequence of operators predicted by the 
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Plate 4.1. Accuracy of operator sequence prediction by models. 



80 CHAPTER 4 

models, summing the mean times for each operator in the sequence as 
before. This estimate, which we shall call a "prediction," should 
correspond with what we might expect to find applying the models in 
practice. Error can enter into the estimate either because an operator 
actually takes longer in some contexts than others, as the operator 
SPECIFY-ARG takes longer when specifying an argument to the 
Modify command in POET than to the Substitute command; or error 
can enter because the model predicts the wrong sequence of operators 
and the wrong sequence is predicted to take a different amount of time 
than the correct sequence actually requires. Comparing the prediction 
result with the result for the reproduction of the data gives us a way to 
assess where the model's most important sources of error lie. 

Estimation of Operator Times 

The durations of all operator instances of each operator type in the 
Derivation data were used to estimate the operator times. Table 4.11 is a 
table of the empirically determined duration for each operator of each 
model derived from the Derivation data. This table presents the 
operators in the order they are described in the manuscript editing 
models in Tables 4.3 to 4.5. The operators for Models M16A, M4A, 
M4B, M2A, and M0.5A are given in their entirety, so that some 
operators occur more than once. Only the operators unique to that level 
are given for the other models at Level 2 and Level 0.5. 

Since the data come from a naturalistic situation and since a 'rare 
method may appear only once in the data, there is a fair chance that 
some radically extreme times may show up in the distributions of 
operator times. Though these must be accepted in any prediction test, it 
is appropriate to avoid them in estimating the characteristics of the 
operators. Consequently, in Table 4.11 outliers that lie beyond two SDs 
from the raw mean have been dropped and the mean, SD, and CV 
recomputed for each operator. The number of high and low outliers 
dropped are indicated in the last column of the table. 

All operators except one are modeled as taking constant time. The 
exception is the TYPE operator. While it is obvious that TYPE should 
be parameterized by the number of characters to be typed, we must be 
able to predict what search strings and what substitution strings will be 
used in order to capitahze on the parameterization. At the outset, it is 
unknown whether inaccuracies in predicting the string to be used will 
wipe out any gains from using a more accurate model. To find out, 
model M05E' was defined to be like Model M05E in every way except 
that it uses constant 0.39 sec, the average time per typing burst, as the 
time for TYPE. For Models M05A and M05E the time for TYPE is 







CHAPTER 4 






81 








TABLE 4.11 












Operator Statistics For All Models 








Operator 


Mean 
(sec) 


SD 
(sec) 


CV 


%Time 


N 


%N 


L,H 


Model M16A 
















EDIT-UT 


11.38 


3.36 


0.30 


100% 


26 


100% 


0.1 


Model M4A 
















GET-NEXT-PAGE 


2.14 


1.37 


0.64 


3% 


5 


5% 


0.0 


GET-UT 


1.92 


0.64 


0.33 


16% 


24 


23% 


0.1 


LOCATE-LINE 


3.98 


L16 


0.29 


32% 


24 


23% 


0.1 


MODIFY-TEXT 


3.85 


1.54 


0.40 


35% 


26 


25% 


0,1 


VERIFY-EDIT 


1.49 


0.85 


0.57 


14% 


26 


25% 


0,1 


Average 


2.77 


1.07 


0.41 










Model M4B 
















GET-NEXT-PAGE 


2.14 


1.37 


0.64 


3% 


5 


5% 


0.0 


GET-UT 


L92 


0.64 


0.33 


16% 


24 


23% 


0,1 


USE-QS-METHOD 


3.94 


L19 


0.30 


28% 


21 


20% 


0.1 


USE-LF-METHOD 


4.27 


1.05 


0.25 


4% 


3 


3% 


0,0 


USE-S-CMD 


3.63 


1.36 


0.37 


29% 


24 


23% 


0,1 


USE-M-CMD 


9.72 


6.10 


0.63 


6% 


2 


2% 


0,0 


VERIFY-EDIT 


1.49 


0.85 


0.57 


14% 


26 


25% 


0.1 


Average 


2.83 


L12 


0.41 











NOTE: Column N gives the total number of occurrences of each operator type in each model. 
Column %N gives the percentage of occurrences and Column %T gives the percentage of total time 
of each operator type in each model. An initial set of statistics was computed for each operator 
type, and all instances lying more than two SDs from the mean were declared as outliers. Column 
"L,H" gives the number of low and high outliers. After discarding the outliers, the statistics for 
each operator type were recomputed. These are given in Columns Mean, SD, and CV (CV = 
SD/Mean). 'Hie rows labelled "Average" give the N-weighted average statistics for all operators in 
each model. 
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TABLE 4.11 (continued) 



Operator 


Mean 


SD 


cv 




%Time 


N 


%N 


L.H 




(sec) 


(sec) 
















Model M2A 




















GET-NEXT-PAGE 


2.14 


1.37 


0.64 




3% 




5 


3% 


0,0 


GET-FROM-MS 


2.06 


0.91 


0.44 




4% 




6 


3% 


0.0 


GFM+SC 


1.80 


0.40 


0.22 




12% 




18 


10% 


0.1 


VERIFY-LOC 


1.94 


0.87 


0.45 




12% 




17 


9% 


0.1 


V + SC 


2.00 


0.88 


0.44 




4% 




7 


4% 


0.0 


VERIFY-EDIT 


1.49 


0.85 


0.57 




14% 




26 


14% 


0.1 


SPECIFY-CMD 


1.47 


1.14 


0.77 




13% 




28 


15% 


0.0 


SPECIFY-ARG 


1.46 


0.84 


0.57 




38% 




76 


42% 


0,3 


Average 


1.60 


0.84 


0.55 














Model M2B 


The operators in M2B are 


the 


same as 


in 


M2A except for 






SPECIFY-CMD and SPECIFY-ARGy 


which are 


expanded below. 


SPECIFY-CMD/NG 


0.40 


0.35 


0.88 




2% 




11 


6% 


0,1 


SPECIFY-CMD/G 


2.03 


0.99 


0.49 




11% 




17 


9% 


0,0 


SPECIFY-ARG/NG 


1.29 


0.70 


0.54 




29% 




63 


34% 


0.3 


SPECIFY-ARG/G 


2.28 


1.02 


0.45 




10% 




13 


7% 


0.0 


Average 


1.59 


0.76 


0.51 














Model M2C 


The operators in M2C are 


the 


same as 


in 


M2A except for 






SPECIFY'ARG, which is 


expanded below. 








SPECIFY-ARG/Q 


2.07 


0.57 


0.28 




14% 




21 


11% 


1.1 


SPECIFY-ARG/S1 


1.34 


0.94 


0.70 




12% 




24 


13% 


0.1 


SPECIFY-ARG/S2 


0.94 


0.29 


0.31 




8% 




24 


13% 


0.2 


SPECIFY-ARG/M 


2.04 


1.36 


0.67 




5% 




7 


4% 


0,0 


Average 


1.61 


0.80 


0.50 














Model M2D 


The operators of M2D are 


the 


same as 


in 


M2B expect for 






SPECIFY-ARG, which is 


expanded below 








SPECIFY-ARG/Q/NG 


1.94 


0.42 


0.22 




9% 




14 


8% 


1.1 


SPECIFY-ARG/Q/G 


2.29 


0.75 


0.33 




5% 




7 


4% 


0,0 


SPECIFY-ARG/S1/NG 


1.12 


0.73 


0.65 




9% 




21 


11% 


0,1 


SPECIFY-ARG/S1/G 


2.79 


0.96 


0.34 




3% 




3 


2% 


0.0 


SPECIFY- ARG/S2/NG 


0.93 


029 


0.32 




8% 




23 


13% 


0.2 


SPECIFY-ARG/S2/G 


1.20 


- 


- 




0% 




1 


1% 


0.0 


SPECIFY-ARG/M/NG 


2.05 


1.21 


0.59 




3% 




5 


3% 


0.0 


SPECIFY-ARG/M/G 


2.02 


2.29 


1.13 




1% 




2 


1% 


0.0 


Average 


1.60 


0.70 


0.47 
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TABLE 4.11 (continued) 










Operator 


Mean 
(sec) 


SD 
(sec) 


cv 


%Time 


N 


%N 


L,H 


Model M0.5A: 
















MENTAL 


0.62 


0.54 


0.88 


60% 


260 


43% 


0.12 


TYPE 


0.39^ 


0.12^ 


0.31^ 


22% 


173 


28% 


- 


LOOK-AT 


0.31 


0.10 


0.32 


13% 


139 


23% 


9.1 


HOME 


0.52 


0.11 


0.22 


2% 


9 


1% 


0,1 


TURN-PAGE 


0.67 


0.21 


0.32 


1% 


5 


1% 


0.0 


MOVE-HAND 


0.19 


0.17 


0.91 


1% 


17 


3% 


0.1 


ACTION 


0.13 


0.19 


1.56 


0% 


6 


1% 


0.0 


EXPRESSION 


0.23 


- 


- 


0% 


1 


0% 


0.0 


Average 


0.47 


0.30 


0.58 











Model M0.5E The operators for M0,5E are the same as in M0.5A except for 

MENTAL, which is expanded below. 



SEARCH-FOR 

SF + CM 

CHOOSE-CMD 

CHOOSE-ARG 

COMPARE 

C + CM 

UNKNOWN 

Average 



0.72 


0.51 


0.71 


7% 


28 


5% 


0.1 


1.07 


0.56 


0.52 


7% 


20 


3% 


0.0 


0.74 


0.42 


0.57 


2% 


8 


1% 


0,0 


0.41 


0.33 


0.81 


9% 


56 


9% 


0,4 


1.01 


0.83 


0.82 


22% 


59 


10% 


0,3 


1.14 


0.68 


0.60 


7% 


18 


3% 


0.0 


0.28 


0.25 


0.92 


8% 


71 


12% 


0.2 


0.48 


0.29 


0.55 











^ Since TYPE is a parameterized operator, the number given in the SD column is the 
Standard Error of the parameterized typing prediction, given in the equation in Section 
4.31. ITie Mean for TYPE is the mean of all observed TYPE operations, i.e., as if TYPE 
were a constant operator. The CV given for TYPE is SE/Mean. 
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parameterized by the number of shift characters N^^.j^ carriage returns 
N^^ and other characters N^^^^^. according to the equation 

T= 0.05 + O.llN^,^ + 0.19N^^ + 0.1 W„,^, sec. (1) 

The equation is based on the regression fit of 157 short typing bursts 
from the experiment (1 to 18 characters in a burst, mean 3.8 characters). 
This model predicts the typing time rather well {B? = 0.92, all 
coefficients significantly different from at /? < 10"*), and it is somewhat 
better than the simpler model: 

T = 0.06 + ^llN^j^^ sec 

{R^ = 0.89). That S13 is a fast typist is apparent from these equations 
(0.12 sec per character = 91 words per minute). 

Accuracy of Time Predictions 

Plate 4.2 presents the results of reproducing the Derivation data with 
each the models (solid dots). Reproduction of the Derivation data is the 
weakest prediction because it derives the estimates of operator times from 
the same data that is being predicted and because the operator sequence 
is taken as given and not predicted by the independently derived 
selection rules. It is usefiil because it gives an upper bound on how well 
the models can be expected to predict other data. The accuracy of the 
reproduction is evaluated by comparing predicted unit task times with 
the user's actual unit task times. The average prediction error for each 
unit task is summarized by the root-mean-square of the prediction errors 
(RMSE)-^ expressed as a percentage of the average actual unit task time. 
The reproduction accuracy improves as the models become more 
detailed, but the rate of improvement diminishes. The average 
prediction error (RMSE) is about 40% when using the average unit task 
time as the predictor (Model M16A), and this is cut to 20% by using the 
most detailed model (M0.5E). 

Plate 4.2 also presents the results of using the operator sequences 
predicted by the models as the basis for predictions of the times per unit 
task for both the Derivation and the Crossvalidation data sets. M0.5E 
was not applied to the Derivation data because, given our intimate 
familiarity with that data, we could not fairly a priori predict the TYPE 
strings. 

TTie main result is that the models except M16A on the Derivation 
data are all about equally accurate at prediction, with an RMSE of about 
30%. A study of the prediction errors on unit tasks wiUi different task 
environment features revealed that the only task environment feature that 
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Plate 4.2. Accuracy of unit task time prediction by models. 
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allowed any prediction gain was when the unit task shared the same line 
with another unit task. There are two tasks with the this feature in the 
Derivation data, and they are the sole reason why M16A predicts more 
poorly on that data. Conversely, the lack of such features in any of the 
Crossvalidation unit tasks allows M16A to do better on that data. 

The prediction of tlie unit task times of the data is slightly less 
accurate than the reproduction of these times. Since the predicted time 
per operator is the same in both cases, the difference is due to the 
difference between the predicted and the actual operator sequences. 

If the RMSE measure is interpreted as the average prediction error, 
the 20% to 40% range of the models may seem to be high. But 
predicting editing times unit task by unit task for a single user is a very 
stringent test. If the unit of prediction were the whole manuscript rather 
than the unit task, then the prediction error would drop considerably, 
since tlie high and low predictions of the various unit tasks would tend to 
cancel each other. The RMSE approximately obeys a square root of n 
law, where n is the number of unit tasks^. Thus the RMSE for 
predicting the time to edit the whole manuscript with the reproduction 
models would range from (40%)(64)'^^ = 5% for the worst model to 
2.5% for the best. The RMSE for the prediction models would be 
around 3-4%. The error for these models of variable-sequence, cognitive 
activity would thus seem to be in the same range (~ 5%) as that often . 
cited for pre-determined time system predictions of invariant-sequence, 
physical activity by industrial engineers (Maynard, 1971). 



4.4 DISCUSSION 

Assessment of the Models 

Description of Behavior. From the case study of S13's protocol, it is 
apparent that descriptions of a user's behavior in the manuscript editing 
task can be constructed from a reasonably small number of components. 
Depending on the graii^ of analysis, the behavior could be described by 
1-20 goals, 1-13 operators, 4-6 methods, and 2-4 selection rules. 
Moreover, this description is a reasonably accurate account of a user's 
behavior in the task. The selection rules were able to predict the users 
choice of methods about 90% of the time. Tlie frequency mix of 
different operators predicted by the model correlated well (r = 0.76 to 
1.00) with tlie observed mix. The operator to operator transition matrix 
predicted by the models matched well (r = 0.62 to 1.00) the transition 
matrices calculated from the data. Thus, it would appear that the 
behavior sequence of a user in a naturalistic text-editing situation can, in 
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fact, be described as repetilions of patterns built from a handful of 
components by the GO MS models. 

Prediction of Task Times, llie GOMS models likewise provide a 
reasonable prediction for the amount of time a task consumes. The 
models were able to predict the time per unit task to within about 30% 
on new (Crossvalidation) data. In this situation, the models had to 
predict the times for all the operators as well as the operator sequence. 
Predicting the unit task times to within 30% is equivalent to predicting 
the time to edit the whole manuscript to within about 4%, an accuracy 
comparable to that achieved by industrial engineering "predetermined 
time" systems for repetitive manual operations. 

Grain of Analysis. How does tlie ability of the GOMS models to 
predict the behavior of the user vary as a fijnction of the grain of 
analysis? The short answer is: When the sequence of operators is given, 
refinement of the grain size improves the ability to predict performance 
times. But when the sequence of operators must be predicted, 
refinement of the grain size causes the ability to predict the sequence of 
the refined operators to deteriorate (Plate 4.1); ability to predict 
performance times does not change (Plate 4.2). 

Because of the very large amounts of data analysis required, it was 
not feasible to compute directly the reproduction of the Crossvalidation 
data. The small differences found between the results of the prediction 
on the Derivation data and on the Crossvalidation data, however, suggest 
that the reproduction of the Crossvalidation data would parallel those for 
the Derivation data and that the improvement of time accuracy with 
refinement of analysis grain is not solely attributable to capitalization on 
chance. 

If the models could predict operator sequences perfectly, then the 
prediction curves in Plate 4.2 would drop to tlie reproduction curve. 
That the prediction curves are essentially horizontal implies that refining 
the grain of analysis did not tap the sources of time variability. In the 
models, variability is expressed in tlie method selection rules and 
optionality conditions, which are triggered by features of the task 
environment. ITius, either the models didn't capitalize on all the 
available features in the task environment, or there were no task 
environment features that gave clues to the variability. 

In fact, it is important to note that the variability in the set of unit 
tasks in the experiment is quite small, both with respect to the user's 
perfoitnance times (see Table 4.10) and with respect lo the range of edit 
tasks on the manuscript — all are small edits of about the same 
complexity. 'Hiis low variance was intentional, since we were not trying 
lo manipulate the task environment, but were trying lo assess the natural 
variability in the data and the ability of various models lo deal with it. 
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It appears Lliat while the models, as a whole, were not bad at predicting 
the average time per unit task, there was insufficient variation within the 
editing tasks to trigger increased responsiveness from the finer grain 
models. 

Tlie level of variability for which the models can account begins to 
show up in the difference between M16A and M4A in predicting the 
Derivation data and between M16A, M4A, and M4B in the 
reproductions. 'The models do improve in performance down to about 
Level 4. Thus some variability is expressible in terms of combinations of 
Level 4 operators. 

General Issues of Model Structure 

What psychological reality is to be ascribed to the various components 
and features of the GOMS model? 

Goals. The occurrence of goals in the GOMS model is one of its 
primary cognitive features. Goals are required in generating the model 
and in supporting its rational character as behavior directed towards the 
end of editing the manuscript. As it stands, however, the goals do not 
make any distinguishable contribution to the timing calculations of the 
various models. Technically, this arises from a confounding of goals and 
methods/operators, so that any time assigned to creating a goal or to 
cleaning up and disposing of a goal would not be distinguishable from 
additional time in the associated operators. Goal-manipulation 
operations should not take longer than about 0.5 sec, so that goal 
operators should not show up at the Level 2 or above in any event. 

ITie confounding of goal manipulation times results in part from 
GOMS being a model of skilled behavior, so that the overt record 
contains evidence only of the sequence of effective actions. This can be 
confirmed from the verbal expressions made during manuscript editing. 
In our user, there are no verbal expressions that indicate goal activity. 
However, protocols from inexperienced users are sprinkled with goal 
statements that correspond to the goals in the GOMS model. In one 
such experiment, when the model predicted the processing of GOAL: 
USE QS-METHOD, the user would almost invariably make comments 
like: "Okay, I want to get down to a line that starts with 'Food store' ". 
When the model predicted GOAL: USE S-CMD, the user would say: 
"Now I want to substitute '30' for '39' ". But in line with this view no 
verbalizations occurred related to operator actions like TYPE. 

Uniformity of Level Taking a model at a given grain size, like 2 sec, 
tends to create a set of operators which are homogeneous in their 
duration. Operators much larger than the grain size are always 
decomposed. Operators much smaller than the grain sizes can be 
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discarded on grounds of making only insubstantial contribution to the 
total time. 

From a practical point of view, it might be just as effective to use a 
single operator time for all operators at a given grain size. There is some 
indication {Wainer, 1976; Claudy, 1972) that such a procedure would not 
materially reduce the prediction accuracy, while increasing the robustness 
of the estimate and enhancing the use of such models in applied work. 

Operator Variability. The order of precision of our operators, as 
measured by the coefficient of variation {CV - the standard 
deviation/mean), ranges from about 0.9 at Level 0.5 to 0.3 at Level 16. 
In general, CV\ should be expected to decrease with increasing mean in 
situations with compositions of sums of elements. ITie actual decrease is 
illustrated in Plate 4.3, which plots CV against operator mean (A/). The 
open symbols represent operators from Table 4.11 that occurred more 
than 5 times in S13's protocol. The other symbols on the graph are from 
measurements of industrial operations in which there is some significant 
cognitive element. ITie solid circles are operators in a ladies garment 
factory, such as cutting and stitching clothing patterns, from Abruzzi 
(1956, pp. 222 & 235); the crosses are from a study by Mills and Hatfield 
(1974) of a data lookup and entry task. In log-log coordinates the 
relationship between mean and CK is essentially linear. In fact, a 
regression analysis of the data in Plate 4.3 gives the following equation: 

log CV = -0.340 - 0.211 log M 



or 



CV = 0.457ArO-2^^ 



This equation explains 58% of the variance, and the coefficient -0.211 is 
very significantly different from zero (/(82) = -10.74, p < 10"*). Plate 
4.3 suggests that in absolute terms the CKs observed in our experiment 
are roughly what would be expected from the size of the operations 
alone. 

As CF increases, the number of observations needed to estimate the 
mean to a fixed precision also increases (see Abruzzi, 1956). ITiis is 
reflected in the figure as greater dispersion of the points about the line as 
for small M, and in the fact that many of the points on the outlying edge 
arc those with the lowest A^s. 

The dispersion of the points in Plate 4.3 also reflects the tendency of 
physical operators to have lower O's than mental operators of the same 
mean, hi Ihc figure nearly all of the purely menial operators (indicated 
by triangles) lie above the regression line while the purely physical 
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operators (indicated by squares) lie below. Tlie outlying points below the 
regression line are simple physical acts such as LOOK and TURN- 
PAGE. ITie outlying points above the line are mental actions such as 
CHOOSE-CMD or VERIFY-EDIT. As the time for the operators 
becomes shorter, approaching the grain of characteristic physiological 
events, the operators tend to become more purely physical or mental. 
Since the physical operators are easier to identify and measure, these 
should have lower CVs. 



4.5 CONCLUSION 

In this chapter we have sought to refine our understanding of the 
manuscript editing task by examining in detail the parts of the 
modification cycle. We have examined one editor (POET) in its natural 
setting, proposed a theory for it (GOMS), and used that theory to 
describe the behavior. We have used the theory to perform various 
reproductions and predictions, to assess the stability of the constructs 
estimated in the theory and to identify sources of naturally occurring 
variability. We have restricted our investigation to a single subject and a 
single session so that we could carry out an intensive analysis of the data, 
but our results are supported in general by other subjects performing 
manuscript editing tasks in our laboratory. 

We can now proceed to give answers to our original questions. 

(1) Can the behavior of a user in a manuscript editing task be 
described as the combination of a small number of 
elementary behavioral acts? Yes. The error- free behavior 
in manuscript editing is satisfactorily described by the 
GOMS model. This description is built up from a small 
number of operators, goal types, methods, and selection 
rules; and tliese components reflect the structure of the 
manuscript editing task environment. TTie number of 
operators required depends on the grain of the model but 
was never more than about 16. We have shown how the 
model can be extended to handle most error-handling 
behavior. 

(2) Can we predict the stream of these acts from an analysis of 
the environment? Use of the model to predict new but 
similar data (/cro-parameler prediction) reveals that the 
sources of variability are difficult to tap. Models at all 
levels yield about the same quality of predictions to new 
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data. The relatively low variance in the data makes the 
results not surprising, though somewhat disappointing. 
Viewed negatively, these results say that measurement in an 
essentially identical situation is as good a predictor as the 
theory; viewed positively, these say that predictions can be 
made from component operators at any level, even if direct 
measurements are unavailable. Even in the harshest 
prediction's test, the first order refinement (M4) was 
superior to the constant time per task model used in 
Chapter 2. 

(3) Finally, what grain size is appropriate for modeling? The 
behavior can be satisfactorily described at several levels 
(time grains), which constitute consistent decompositions 
within the GOMS model. No really preferred level of 
description was found, reflecting in part that the user's 
behavior is organized at all of these levels, as the GOMS 
model asserts. Models increase in generality (task 

independence) as level decreases; they increase modestly in 
descriptive power; but they become more expensive to 
construct. 

Probably the most important feature of the manuscript editing task 
to emerge is it's unit task structure, which sets it apart from other tasks 
(typing, reading, tracking) that have received more attention. The 
practical usefulness of the unit task concept will be demonstrated in the 
next chapter. 
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NOTES 

^ RMSE = [yiE^/(n-l)] '"^ , where E is the dislribulion of prediction 
errors over the unit taslcs. RMSH is the standard deviation of E about 
zero, instead of the actual mean of E, and thus RMSE > SD (£). If 
Mean(ir) = 0, then RMSE = SD(ZO. and RMSE is equivalent to the 
standard error. TTie calculation should actually be done with SD(£) 
about MeanC^'), but the use of the RMSE is approximately correct if 
Mean(£) is close to zero. 

" In attempting to build a statistical model to partition the variance 
for each model into the operator variance and the structural variance 
(i.e., tlie conditionality of the operator sequence), some of the variance 
remains unaccounted for. Part of the difficulty, apparently, is the 
existence of covariances between non-adjacent operators, even in the 
Level 4 models. Hence, we must reserve judgement about the 
independence of the operators. 

^ We would like to acknowledge George Baylor, from the University 
of Montreal, for helping us formulate the method selection issue and for 
running some pilot experiments. 
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In this chapler we apply some of the principles of Chapter 4 to a 
lypical indiislrial systems design problem. 
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Adapted from an iniimblishod tncnioranduin by ( ard, Moran, and Newell. 
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5.1 STATEMENT OF THE PROBLEM 

A computer-based system is to be built in which it will be possible 
for a user to do the layout for a journal roughly in the style of Cognitive 
Psychology. ITie user, starting from separate on-line files of main text, 
figures, captions and footnotes has the task of arranging all of these 
elements into a final page. This job is called "formating" and it 
includes as well the rendering of text words into different fonts; the 
treatment of headings; and the final numbering of figures, cross- 
reference pages, and footnotes (see Figure 5.1). The problem is to make 
a rough order-of-magnitude estimate of the time per page a user would 
be expected to spend formatting. Since the final command dialogue has 
not yet been designed, the estimate can not depend on its details. 

5.2 UNIT TASKS SOLUTION 

The problem can be solved by recalling from Chapter 4 that the 
user's behavior is organized into "unit task cycles." By estimating the 
number of unit tasks necessary to accomplish the formating of a typical 
page and multiplying that number by the estimated time per unit task, 
the formating time/page can be estimated. 

Step J. Analyze the problem into unit tasks. 

Chapter 4 found that a user's behavior in editing is organized into a 
sequence of unit task cycles of the same basic operations. Taking into 
account that the user may have to await the response of the system 
before he can proceed, the cycle can be written 

GETNEXT-TASK (G) 

LOCATE-TASK-ELEMENTS (L) 

MAKE-MODIFICATION (M) 

WAIT-FOR-SYSTEM (Optional) (W) 

VERIFY (V) 

A typical subtask the user must perform is to specify to the system 
tlie lype font of a section heading. We do not know the exact 
commands he will use since the system has yet to be specified, but it 
can be expected that the necessary actions will fall into a unit task 
pattern: 
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Figure 5.1. Conceptual design of problem system 
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Thus, the first step in solving the problem is to analyze the task 
environment into these unit tasks. ITie unit tasks in the current 
problem have been entered onto Form A in Figure 5.2. ITiere are 34 
unit tasks centered around starting a new page, headings, the text body, 
special treatments of text words, figures, captions, and footnotes. 

Step 2. Identify a set of Macro Operations. 

Often the user will be performing a number of related tasks in 
sequence. For every heading he inserts into the text he will need to 
specify the font in which the heading is to be rendered, to specify the 
vertical placement, to specify the horizontal placement, and to specify 
the heading number. We can simplify the analysis by identifying a 
macro operation. Process Heading (abbreviated #Ph). As macro 
operations are identified they are written in the heavy boxes of Form B 
(Figure 5.3). For the current problem, five macro operations can be 
identified: Process new page (abbreviated #Pp), Process heading 
{#Ph), Process figure {^Pfi, Process footnote {#Pf), and Process 
reference {#Pr). 



5.22 Analysis of System 

Step 3 List unit task components of the macro operations, recording 
system assumptions. 

In order to proceed flirther, it is necessary to specify some broad 
characteristics of the system. Will it be necessary for the user manually 
to number each page? Need he manually indent each paragraph? Even 
though a system is in the pre-specification stage, it may be possible to 
settle on these general issues. If not. then alternative systems can be 
specified so as to examine the range in behavior induced by the 
alternative designs. ITie latter course will be followed here. We shall 
analyze two systems for the problem: System A, in which the user must 
do a great deal by hand, and System B, which is more automatic. 
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FORMA 

Enumeration of Unit Tasks 


ID 


Description 


Wait 


Lp 


Load page 


X 


Vp 


Specify vertical extent of page 




Hp 


Specify horizontal extent of page 




Np 


Specify page number 




Lh 


Load heading 


X 


Vh 


Specify vertical spacing of heading 




Hh 


Specify horozontal placement of heading 




Fh 


Specify heading font 




Nh 


Specify heading number 




Lhk 


Load contant heading 


X 


Vhk 


Specify vertical spacing of constant heading 




Hhk 


Specify horozontal placement of constant heading 




Fhk 


Specify constant heading font 




Lb 


Load body of textual material 


X 


Vb 


Specify vertical placement of text body 




Hb 


Specify horozontal placement of text body (e.g. indent) 




Fb 


Specify font of text body 




Nb 


Specify numbering of text body 




Lt 


Load text virord 


X 


Ft 


Specify font of text word 




Nl 


Specify number for text w/ord (e.g., footnote number) 




Lf 


Load figure 


X 


Vf 


Specify vertical placement of figure 


X 


Hf 


Specify horozontal placement of figure 


X 


Lc 


Load caption 


X 


Vc 


Specify vertical placement of caption 


X 


He 


Specify horoztonal placement of caption 


X 


Fc 


Specify font of caption 




Nc 


Specify figure number (which goes with caption) 




Ln 


Load footnote 


X 


Vn 


Specify vertical placement of footnote 


X 


Hn 


Specify horozontal placement of footnote 


X, 


Fn 


Specify footnote font 




Nn 


Specify footnote number (with note body, not callout in text) 





Figure 5.2. Unit tasks for problem 
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FORMB Page 1 of 2 

Macro Operations ^ . 










ID 


Description 


Waits 


UTs 1 


MACRO OP 


#Pp 


Process new page 


1 


\ 


COMPONENT 
UNIT TASKS 


Lp 


Load page 


X 




Np 


Specify page number 














































MACRO OP 

COMPONENT 
UNIT TASKS 


#Ph 


Process treading 





■ 1 


Fh 


Specify font of heading 






Vh 


Specify vertical spacing of fieading 
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Figure 5.3. Macro operations for System A 



100 



CHAPTER 5 



MACRO OP 

COMPONENT 
UNIT TASKS 

MACRO OP 

COMPONENT 
UNIT TASKS 

MACRO OP 

COMPONENT 
UNIT TASKS 




FORM B 

Macro Operations 


Page 2 of 2 

System' 






ID 


Description 


Waits 


UTs 1 


#Pn 


Process footnote 


3 


■ 1 


Ln 


Load footnote 


X 




Vn 


Specify vertical location of footnote 


X 


Hn 


Specify horizontal location of footnote 


X 


Fn 


Specify footnote font 




Nn 


Number footnote body 




Nt 


Number footnote callout 






•• 
















#Pr 


Process reference 





■1 


Ft 


Italicize journal name 






Ft 


Render volume number in boldface 




















































1 










































■ 



















Figure 5.3. (continued) 
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System A will leave to the user the loading of a page and the 
processing of headings, figures, footnotes, and miscellaneous numbering 
and word changes. In System B, more work will be taken over by the 
system. 

The analyst proceeds as follows: for each macro operation on Form 
B, he fills out the unit tasks (from Form A) of which it is composed. 
This activity is naturally accompanied by decisions and assumptions 
about the characteristics of the subject system. As each decision or 
assumption is reached it is entered on Form C (Figure 5.4). 

The first macro operation in Figure 5.3 is Process new page. In 
System A, the user must specify the page number himself. So there are 
two component unit tasks for process page: Load page and Specify page 
number. But in System B, the system takes care of page numbering 
automatically, so only Load page is entered. A note about the automatic 
page numbering is made on Form C. 

Step 4. Compute the number of unit tasks/operation 

Using the above assumptions regarding the system, we can count the 
number of unit tasks from the list in Figure 5.1 necessary to perform 
each macro operation: #Pp {Process new page) requires 2 unit tasks, 
#Ph (Process Heading) 4, #/*/ {Process figure) 8, and ^Pn {Process 
footnote) 6. 

Step 5. Compute the number of waits/operation 

Examination of the unit tasks on Form A, keeping in mind the sort of 
technology to be used for system implementation, will suggest which unit 
tasks should have WAIT operators attached to them. Lp {Load page) 
will likely be one unit task that will require the user to wait while a new 
page is loaded in and displayed on the screen, whereas it is expected that 
Np {Specify page number) will be executed without noticeable delay. 
Those unit tasks in Form A for which some wait by the user is expected 
have been so indicated. 

Step 6. Estimate the time/unit task and time/wait 

A reasonable assumption for the time/unit task required by users of 
System A is that it would be in tlie range of the two display-based 
editors measured in Chapter 2. Combining the time/unit task in Table 
2.2 for correct tasks with Hditor Y and RCG yields (10.1 + 7.2)/2 = 
8.65, which we round lo 8.5 sec/unit task. 
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System Characterization ^ * a 
^ System: A 
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Description 
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Typed headings embedded in text (no Lh) 


A2 


Text loaded on call (Lt) with page size, margins, position, and 




font automatic (no Vp, Hp, Vb, Hb, Fb) 


A3 


Paragraphs need to be indented manually 


A4 


Either one or two figures fit together horizontally 


A5 


Text cannot go horizontally beside figure 


A6 


Figure can go vertically either at top, at bottom, or in middle 


A7 


When figure is moved, overlaid text automatically slides around, 




possibly onto next page 


A8 


All headings must be set manually 


A9 


All numbers (Nt) are kept track of by the system, but must be 




placed by user 
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Figure 5.4. Characterization of System A 
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In order to set a number for the time/wait, a number of exercises 
were performed, using various display-based systems, in which the user 
would have to wait for a system response. ITie time to load a file of 
text, to move text on the screen, and to load and move various sorts of 
pictures was timed. While the extremes of these times ranged from 2 sec 
to 425 sec, a large number clustered around 6 sec/wait and that number 
was selected for use. Both of these numbers are entered on Form C 
(Figure 5.4). 

5.23 Analysis of Ecology of Data 

Step 7. Estimate frequency of operations 

The different activities which might engage the user require different 
times to perform. The average time per page required by the user 
depends on the relative frequency with which he is required to perform 
these activities. 

For the purposes of this problem a small sample was taken of articles 
appearing in Cognitive Psychology. For each article, the number of pages 
was counted as well as the number of headings, figures, footnotes, and 
paragraphs. The number of text segments requiring a special font 
treatment, such as italicized words, were counted (excluding those text 
segments already counted elsewhere, such as headings). Finally, the 
number of references was counted. These statistics are entered onto 
Form D (Figure 5.5). Each statistic is divided by the number of pages in 
the article and the result averaged over all the articles. 

For many analysis purposes, these numbers would be sufficient. It is 
the claim of some industrial engineers who do time and motion studies 
on office work that sufficient accuracy is obtained by examining a single 
example of the work to be done as long as the example is not untypical 
(Bim, Crossan, and Eastwood, 1961; p. 288). For more precise results, 
one could examine a larger sample. 

5.24 Compute Results 

Step 8. List the operations needed for the problem 

At this point, the subsidiary calculations and data gathering have been 
completed and the final calculation can be made. 'Hie fomiating of a 
page in System A will require these operations: 
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FORM D 

Frequency of Operations 


Sample 
No. 


No. 
Pages 


Items Counted 


Head. Fig. F.note Para. Font. Refs. 


RAW COUNT 


1 


31 
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9 


4 


72 


39 
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2 


32 


60 


11 


6 


92 
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3 


22 


30 


5 


4 


84 


94 
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4 


12 
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3 


45 


90 
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ITEMS/PAGE 


1 


31 


0.68 


0.29 


0.13 


2.32 


1.26 


3.03 












2 


32 


1.88 


0.34 


0.18 


2.62 


5.91 


1.34 












3 


22 


1.36 


0.22 


0.18 


3.82 


4.27 


1.09 












4 
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0.58 


0.41 
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0.25 
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Figure 5.5. Frequency count for four journal articles 
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Five of these are macro operations, two are single unit tasks. Each of 
these operations is entered onto Form E (Figure 5.6). 

Information needed in the calculation is transferred to Form E: from 
Form B, the number of unit tasks n, „ and waits n^„ for each of 

' uq xq 

these operations is entered; from Form C, the time/unit task t^ and 
time/wait t^ are obtained. 

Step 9. Compute the work time and wait time for each operation 

The time required by a user to perform operation q can be divided 
into working time w and time x spent waiting for the response of 
the machine. Workmg time can be computed from 

% = V" 
The waiting time can be computed from 

X = n t 

Step 10. Compute work time/page, wait time/page, and total time/page 
for each operator. 

By multiplying the working and waiting time for each operator by the 
expected number of occurrences of that operator per page f . , we can 
obtain the expected working time per page w . and waiting time per 

page Vp- 

The average time per page expected to be spent doing operator q is 
then the sum 
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Computation Summary 
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Waits 
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Wait 
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Wait 
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% 


"uq 


Hxq 


(sec) 
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Wq/p 
(sec) 


^q/p 
(sec) 


Tq/p 
(sec) 


#Pp 


Process new page 


2 


1 


17.0 


6.0 


1.00 


17.0 


6.0 


23.0 


12% 


#Ph 


Process headings 


4 





34.0 





1.12 


38.1 





38.1 


20% 


#Pf 


Process figure 


g 


6 


76.5 


36.0 


0.24 


18.4 


8.6 


27.0 


14% 


#Pn 


Process footnote 


6 


3 


51.0 


18.0 


0.15 


7.6 


2.7 


10.3 


6% 


Hb 


Indent Paragraph 


1 





8.5 





3.13 


26.6 





26.6 


14% 


Ft 


Specify word font 


1 





8.5 





4.42 


37.6 





37.6 


20% 


#Pr 


Process ref s 


2 





17.0 





1.43 


24.3 





24.3 


13% 
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"" "^ uq * u 
= nxq tx 
= fq/pWq 
=fq/p Xq 


Time/Ut 


tu 


8.5 


Totals 


169.6 


17.3 


186.9 


100% 


q 

^q 

Wq/P 


Time/Wait 


tx 


6.0 


Add 25% error time 


46.7 




Comp. Fact. 


c 


.8 


Total with error time 


233.6 





Figure 5.6. Computation of user time/page for System A 
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Step 11. Compute the total time/page. 

The total time per page is just the sum of the operations: 

Computation of this number for the System A gives T = 186.9 sec. 
Tlie wait time is 17.3 sec/page or about 9% of the time. The most time 
consuming operations are expected to be processing the headings and 
specifying text word fonts, each taking 20% of the time. 

Step 13. Adjust for estimated error time. 

The estimate so far has not allowed for the effect of errors. Since 
better data are not available, we shall use the 25% error time figure from 
Chapter 4. The total time/page, adjusted for error, is 

r • = r + .25r 
p p p 

Adding in error time raises the estimated time/page for the problem to 
233.6 sec/page. 

Comparison with System B. 

In System B, pages are numbered automatically, eliminating the page 
numbering step from #Pp. Headings are processed by specifying to 
which class they belong. The system then locates them, renders them in 
the correct font, and numbers them based on a previously specified style 
sheet. The work to be done for #Ph is reduced to a single unit task: 
Th, Specify heading type. System B also keeps track of the figure 
number so that captions are numbered automatically. The macro 
operations which are affected and the computation for System B are 
given in Figure 5.7 and Figure 5.8. 

The increased automaticity of System B saves the user 83 sec/page. 
System B is predicted to require 152 sec/page, about 35% less time than 
System A. In System B, the wait time is the same 17.3 sec, but this is 
now 14% of the (error-free) time. Font changes to words in the text now 
account for 31% of the time. 
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Process new page 
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1 


Lp 


Load Page 
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#Ph 


Process heading 





1 


Th 


Specify heading type 




















































#Pf 


Process figure and caption 


6 


•1 


Lf 


Load Page 


X 




Vt 


Specify vertical location of figure 


X 


Hf 


Specify horizontal location of figure 


X 


Lc 


Load caption 


X 


Vc 


Specify vertical location of caption 


X 


He 


Specify horizontal location of caption 


X 


Fc 
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Number figure callout in text 











Figure 5.7. Macro operations for System B (first page) 
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FORME 

Computation Summary 




System: B 


TASK ANALYSIS 


SYSTEM ANALYSIS 


ECOL. 


RESULTS 


CODE 


DESCRIPTION 


NUMBER 


TIME 


FREQ 
fq/p 


TIME 


UTs 


Waits 


Work 


Wait 


Work 


Wait 


Total 


% 


"uq 


Hxq 


(sec) 


(sec) 


(sec) 


Xq/p 
(sec) 


Tq/p 
(sec) 


#Pp 


Process new page 


1 


1 


8.5 


6.0 


1.00 


8.50 


6.00 


14.50 


12% 


#Ph 


Process headings 


1 





8.5 





1.12 


9.52 





9.52 


8% 


#R 


Process figure 


8 


6 


68.0 


36.0 


0.24 


16.32 


8.64 


24.96 


21% 


#Pn 


Process footnotes 


6 


3 


51.0 


18.0 


0.15 


7.65 


2.70 


10.35 


9% 


#Ft 


Process text fonts 


1 





8.5 





4.42 


37.57 





37.57 


31% 


#Pr 


Process refs 


2 





17.0 





1.43 


24.31 





24.31 
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Totals 
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Comp. Fact. 


C 
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Figure .'^.8. Coniput;»tioii of user time/page for System B 
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5.3 INTERACTIONS AMONG UNIT TASKS 

Tlie above computations assumed that the time to perform a unit task 
is independent of the surrounding unit tasks. In fact, unit tasks are not 
independent of each other, but the assumption is quite reasonable for 
this analysis. 

The macro operation #Ph, Process heading, will, according to Figure 
5.3 be composed of four unit tasks: 

Fh Specify font of heading 

Vh Specify vertical spacing of heading 

Hh specify horizontal placement of heading 

Nh Specify heading number. 

Expanding these to GLMWV steps gives 

{Fh Specify font of heading 

Detect heading G 

Point to entire heading L 

Change font of heading M 

Verify V 

{Vh Specify vertical spacing of heading 

Detect heading G 

Point to heading L 

Insert lines before heading M 

Insert lines after heading M 

Verify V 

{Hh Specify horizontal placement of heading) 

Detect heading G 

Point to heading L 

Move heading M 

Verify V 

{Nh Specify heading number) 

Detect heading G 

Point to heading number L 

Insert number M 

Verify V. 

But if the user performs these unit tasks one after another he need 
not do all of the suboperations. For example, once he has selected the 
heading, in unit task Fh, he need not do it again in Vh and Hh. 
Rewriting with tlie redundant steps eliminated gives 
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{Fh Specify font of heading) 

Detect heading G 

Point to entire heading L 

(Vh Specify vertical spacing of heading) 

Detect heading G 

Point to heading L 

Insert lines before heading M 

Insert lines after heading M 

Verify V 

(Hh Specify horizontal placement of heading) 

Move heading M 

(Nh Specify heading number) 

Point to heading number L 

Insert number M 

Verify V. 

Instead of the 16 GLMV suboperations that it nominally takes to do 
the four unit tasks separately, it only takes 9 suboperations to do the 
macro operation #Ph. This gives a compression factor of 9/16 = .56. 
Notice, however, that v^hile the detailed examination showed that some 
suboperations are redundant and may be eliminated, the analysis also 
revealed that, as in the case of Vh, some suboperations have extra 
elements (here, the two M suboperations). Repeating this analysis for 
each macro operation shows that the number of extra operations revealed 
is nearly equal to the number of redundant operations and the two 
cancel. The assumption of independence between unit tasks is a good 
approximation for the analysis. 

To show this, the analysis is carried through, recording the number of 
GLMWV operations for each macro unit task in Figure 5.9. The 
individual compression factors (Since WAITS are dealt with separately, 
they do not enter the calculation for the compression factor) for each of 
the macro operations are 



#p/> 


Process new page 


1.00 


^Ph 


Process heading 


0.56 


#Pf 


Process figure 


0.97 


#Pn 


Process footnote 


1.17 


^Pr 


Process reference 


1.00 




Average 


0.94 



Since the average compression ratio is within a few percent of 1.0 (that 
is, no compression) it can be ignored. 
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3 


1 


1 


•1 


Lp 


Load page 


1 




1 


1 






Np 


Specify page number 
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1 




























































' 
































#Ph 


Process heading 
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2 
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1 
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Fh 


Specify font of fieading 


1 


1 
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Vh 


Specify vertical spacing of heading 
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Hh 


Specify horizontal placement of heading 
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Nh 


Specify heading number 




1 


1 
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#Pf 


Process figure and caption 


8 


10 


9 


6 


8 


•1 


Lf 


Load figure 














Vf 


Specify vertical location of figure 












Hf 


Specify horiztonal location of figure 












Lc 


Load caption 












Vc 


Specif y vertical location of caption 












He 


Specify horizontal location of caption 












Fc 


Specify font of caption 
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Nc 


Specify number of caption 
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1 


Nt 


Number figure callout in text 


111 1 



Figure 5.9. Re-analysis of macro operations for System A by number of GLMWV 
suboperations 
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Page 2 of 2 

System* A 






ID 


Description 


G 


L 


M 


W 


V 


UTs 1 


#Pn 


Process footnote 


7 


9 


6 


3 


6 


•1 


Ln 


Load footnote 


1 


1 




1 






Vn 


Specify vertical location of footnote 


1 


2 




1 




Hn 


Specify horizontal location of footnote 


1 


2 




1 




Fn 


Specify footnote font 


1 


2 








Nn 


Number footnote body 


2 


1 








Nt 


Number footnote callout 


1 


1 










































#Pr 


Process reference 


2 


2 


2 





2 


' 1 


Ft 


Italicize journal name 


1 


1 


1 




1 




Ft 


Render volume number In boldface 


1 


1 


1 




1 
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Figure 5.'). (continued) 
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5.4 GLMWV METHOD 

Having gone to the effort to compute the number of G, L, M, W, 
and V suboperations in each macro operation, it is possible to compute 
from measured times for these suboperations the answer to the problem. 
Reasonable values for the required time to perform each suboperation 
can be read from Table 4.2. These values (rounded to the nearest .5 sec 
to emphasize their approximate nature) are G: 2.5 sec, L(command): 
4.0 sec, L(pointing): 2.0 sec, M: 2.5 sec, and V: 1.0 sec. W was 
previously estimated to be 6.0 sec. Figure 5.10 gives the computation. 
This computation is a refinement of the computation using only the 
number of unit tasks and the number of waits which appeared in Figure 
5.6. The resulting (error-free) time per page from the new calculation is 
200.9 sec as compared with 186.9 sec computed in Figure 5.6, a 
difference of only about 7%. Thus, for a conceptual design problem in 
which rough figures will suffice, the unit task calculation method gives a 
close enough approximation to the GLMWV method and with less effort. 



5.5 CAVEATS 

With a simple model of tliis sort caveats must be made. The most 
important caveat is the extremely superficial analysis of the task 
environment. The method is not suitable for detecting significant 
interactions or of discovering that some other phenomenon (e.g., memory 
load) plays a dominant role. Tlius it is possible that the analysis is simply 
wide of the mark in some gross way. 

Further consideration reveals several sources of variability not 
considered in the above calculations. There is variability of factors of 
about two depending on the skill of the operator, and of (independent) 
factors of about two depending on the structure of the command and 
display system (e.g., RCG vs Poet editors) and of some (unknown) 
factor depending on the difficulty of the actual manuscript material. A 
final source of the variabihty comes directly from the crudity of the 
method analysis. The actual methods are substantially more conditional 
than the simple linear analyses used. 

The current analysis does not deal with errors directly. Since errors 
do get converted into extra time, one can develop a factor to estimate 
the "expansion of time due to error'l and this is what we have done. 
An important issue on errors is th)o impact of big errors — of the ones 
tliat occur with relatively low frequency, but take 10, even 20, times the 
mean unit task time for recovery. It is possible tliat such errors could 
be a significant fraction of tlie total time. 
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FORME 2 

Computation Summary 


System: A 


TASK ANALYSIS 


SYSTEM ANALYSIS 


ECOL 


RESULTS 


CODE 


DESCR 


NUMBER 


TIME 


FREQ 
fq/p 


TIME 


G 


L 


M 


W 


V 


tG 


K 


Im 


tw 


t 

V 


To 


Tl 


Tm 


Tw 


\ 


TOT 


% 


#Pp 


New page 


3 


1 


3 


1 


1 


7.5 


4.0 


7.5 


6.0 


1.0 


1.00 


7.5 


4.0 


7.5 


6.0 


1.0 


26.0 


13% 


#Ph 


Headings 


1 


2 


5 





1 


2.5 


8.0 


12.5 





1.0 


1.12 


2.8 


9.0 


14.0 





1.1 


26.9 


13% 


#Pf 


Figures 


8 


10 


9 


6 


8 


20.0 


40.0 


22.5 


36.0 


8.0 


0.24 


4.8 


9.6 


5.4 


8.6 


1.9 


30.3 


15% 


#Pn 


Footnotes 


7 


9 


6 


3 


6 


17.6 


36.0 


15.0 


18.0 


6.0 


0.15 


2.6 


5.4 


2.2 


2.7 


0.9 


13.8 


7% 


Hb 


Indent para. 


1 


1 


1 





1 


2.5 


4.0 


2.5 





1.0 


3.13 


7.8 


12.5 


7.8 





3.1 


31.2 


16% 


Ft 


Text fonts 


1 


1 


1 





1 


2.5 


4.0 


2.5 





1.0 


4.42 


11.0 


17.7 


11.0 





4.4 


44.1 


22% 


#Pr 


References 


2 


2 


2 





2 


5.0 


8.0 


5.0 





2.0 


1.43 


7.2 


11.4 


7.1 





2.9 


28.6 


14% 






























































































































Time/GET 


^G 


2.5 


Totals 


43.7 


69.6 


55.0 


17.3 


15.3 


?00.9 






Time/LOCATE 


^L 


4.0 


Add 25% error time 


50.2 






Time/MODIFY 


'm 


2.5 


Total with error time 


251.1 






Time/WAIT 


'w 


6.0 










Time/VERIFY 


V 


1.0 





Figure .S.IO. Rccompulation of user time/page for System A using times for 
CjLMWV suboperalioiis 
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The method described in this chapter is most suitable for use at the 
conceptual design level. What the method provides is a quick and 
simple method for obtaining order of magnitude estimates of the time 
required by the user to perform a task with a given sort of machine. 
User's time can, of course, be converted into costs and used in an 
economic analysis. 
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An important element in the design of the man-computer interface is 
the method of pointing by which the user indicates to the computer his 
selection of some element on the computer display. This is especially 
important for text-editing programs using a CRT display (such as RCG 
and Hditor Y) where the user may repeatedly use a pointing device to 
select the text he wishes to modify or to invoke a command from a menu 

t 

' Adiiplc-tl fiom Card, liiglish. ami liurr (in Press). 
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displayed on the screen. The choice of pointing device may have a 
significant impact on the ease with which the selections can be made, 
and hence, since pointing typically occurs with high frequency, on the 
success of the entire system. In this chapter, we shall focus in on the 
psychology of this important component of a text editing system. 

English, Englebart, and Berman (1967) measured mean pointing times 
and error rates for the mouse, lightpen, Grafacon tablet, and position and 
rate joysticks. They found the mouse to be the fastest of the devices, but 
did not investigate the effect of distance to target. They also gave no 
indication of the variability of their measures. Goodwin (1975) measured 
pointing times for the lightpen, lightgun, and Saunders 720 step keys. 
She found the light pen and lightgun equally fast and much superior to 
the Saunders 720 step keys. However, she used only one target size and 
did not investigate distance. In addition, her results also show large 
learning effects which are confounded with the device comparisons. Both 
studies were more concerned with the evaluation of devices than with the 
development of models from which performance could be predicted. In 
another line of development Fitts and others (Fitts, 1954; Fitts and 
Peterson, 1964; Fitts and Radford, 1966; Knight and Dagnal, 1967; 
Welford, 1968) developed and tested the relation between distance, size 
of target, and hand movement time. Such a relation might potentially be 
used to predict pointing times for devices involving continuous hand 
movements; however this has not been tested directly. In particular it 
was not known whether Fitts's Law would hold for targets of the shape 
and character of text strings. 

This chapter examines text selection performance with four devices: 
the mouse, a rate-controlled isometric joystick, step keys, and text keys. 
The study differs from the English et al. and Goodwin studies in that 
distance, target size, and learning are all simultaneously controlled and a 
different set of devices is measured. Also, unlike those studies, an 
attempt is made to give a theoretical account of the results. In particular, 
performance on the continuous movement devices is tested against the 
predictions oi Fitts's Law. 



6.1 METHOD 

Subjects 

Three men and two women, all undergraduates at Stanford 
University, served as subjects in the experiment. None had ever used 
any of the devices previously and all had little or no experience with 
computers. Subjects were paid $3.00 per hour with a $20.00 bonus for 
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completing the experiments. One of the five subjects was very much 
slower than the others and was eliminated ft-om the experiment. 

Pointing Devices 

Four pointing devices were tested (see Plate 6.1). Two were 
continuous devices: the mouse and a rate-controlled isometric joystick. 
Two were key operated: the step keys and the text keys. "^Fhe devices 
had been optimized informally by testing them on local users, adjusting 
the device parameters so as to maximize performance. 

The mouse, a version of the device described in English et al. (1967), 
was a small device which sat on the table to the right of the keyboard, 
connected by a thin wire. On the undercarriage were two small wheels, 
mounted at right angles to each other. As the mouse moved over the 
table one wheel coded the amount of movement in the X-direction, the 
other the movement in the Y-direction. As the mouse moved, a cursor 
moved simultaneously on the CRT, two units of screen movement for 
each unit of mouse movement. 

The joystick used was a small strain gauge on which had been 
mounted a rubber knob 1.25 cm in diameter. Applying force to the 
joystick in any direction did not produce noticeable movement in the 
joystick itself, but caused the cursor to move in the appropriate direction 
at a rate = 0.0178 (force)^ in cm/sec, where force is measured in 
Newtons. For forces less than about 4 Newtons, the cursor did not move 
at all, and the equation ceased to hold in the neighborhood of 45 
Newtons as the rate approached a ceiling of about 40 cm/sec. 

The step keys were the familiar five key cluster found on many CRT 
terminals. Surrounding a central HOME key were keys to move the 
cursor in each of four directions. Pressing the HOME key caused the 
cursor to go to the upper left comer of the text. Pressing one of the 
horizontal keys moved the cursor 1 character (0.246 cm on the average) 
along the line. Pressing a vertical key moved the cursor one line (0.456 
cm) up or down. Holding down one of the keys for more than 0.100 sec 
caused it to go into a repeating mode, producing one step in the vertical 
direction each 0.133 sec or one step in the horizontal direction each 0.067 
sec (3.43 cm/sec vertical movement, 3.67 cm/sec horizontal movement). 

The text keys were similar to keys appearing on several commercial 
"word processing" terminals. Depressing the PARAGRAPH key caused 
the cursor to move to the beginning of the next paragraph. Depressing 
the LINE key caused the cursor to move downward to the smne position 
in the next line. Hie WORD key moved the cursor forward one word; 
the CHARAC'I'ER key moved the cursor forward one character. 
Holding down the REVERSE key while pressing another key caused the 
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Plate 6.1. Pointing devices tested. 
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cursor to move opposite the direction it would otherwise have moved. 
'Hie text keys could also be used in a repeating mode. Holding the 
LINE, WORD, or CHARACTER keys down for longer than 0.100 sec 
caused that key to repeat at 0.133 sec per repeat for the IJNE key, 0.100 
sec per repeat for the WORD key, or 0.067 sec per repeat for the 
CHARACTER key. Since there were 0.456 cm/line, 1.320 cm/ word, 
and 0.246 cm/character movement rates were 3.43 cm/sec for the LINE 
key, .13.20 cm/sec for the WORD key, and 3.67 cm/sec for the 
CHARACTER key. 

Procedure 

Subjects were seated in front of a computer terminal with a CRT for 
output, a keyboard for input, and one of the devices for pointing at 
targets on the screen. On each trial a page of text was displayed on the 
screen. Within the text a single word or phrase, the target, was 
highlighted by inverting the black/white values of the text and 
background in a rectangle surrounding the target. The subject struck the 
space bar of the keyboard with his right hand, then, with Uie same hand 
reached for the pointing device and directed the cursor to the target. 
The cursor thus positioned, the subject pressed a button "selecting" the 
target as he would were he using the device in a text editor. For the 
mouse, the button was located on the device itself. For the other 
devices, the subject pressed a special key with his left hand. 

Design 

Text selections and targets were so arranged that there were five 
different distances from starting position to target, 1, 2, 4, 8, or 16 cm, 
and four different target sizes, 1, 2, 4, or 10 characters. All targets were 
words or groups of words. Ten different instances of each distance X 
target size pair were created, varying the location of the target on the 
display and the angle of hand movement to give a total of 200, randomly 
ordered, unique stimuh. 

Each subject repeated the experiment with each device. The order in 
which subjects used the devices was randomized. At the start of each 
day, the subjects were given approximately twenty warmup trials to 
refresh their memory of the procedure. All other trials were recorded as 
data. At the end of each block of twenty trials they were given feedback 
on the average positioning time and average number of errors for those 
trials. 'Hiis feedback was found to be important Iji maintaining subjects' 
motivations. At the end of each 200 trials they were given a rest break 
of about fifteen minutes. Subjects normally accomplished 600 trials/day 
involving about two to three hours of work. They each used a particular 
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device until the positioning time was no longer decreasing significantly 
with practice (operationally defined as when the first and last thirds of a 
block of the last 600 trials excluding the first 200 trials of a day did not 
differ significantly in positioning time at the p < 0.05 level using a ^ 
test). An approximation to this criteria was reached in from 1200 to 
1800 trials (four to six hours) on each device. Of the 20 subject X 
device pairs, 15 reached this criterion, 3 performed worse in their last 
trials (largely because some time elapsed between sessions), and only 2 
were continuing (slightly) to improve. 



6.2 RESULTS 

Improvement of Performance with Practice 

The learning curve which gives positioning time as a function of the 
amount of practice can be approximated (De Jong, 1957) by 



T^ = TjAT" ' (6.1) 



where 



T^ = estimated positioning time on the first 

block of trials, 
Tj^ = estimated positioning time on the A^th 

block of trials, 
A^ = trial block number, and 
« = an empirically determined constant. 

This form is convenient since taking the log of both sides produces an 
equation linear in log N, 

log r^= logr^-rtdogAO. (6.2) 

Tnus the ease of learning for each device can be described by two 
numbers 7^ and a, which numbers may be conveniently determined 
empirically by regressing log 7*^ on log N. Plate 6.2 shows the results of 
plotting the data from error-free trials according to Equation 6.2. Each 
point on the graph is the average of a block A^ of twenty contiguous 
trials from which error trials have been excluded. Only the first 60 trial 
blocks are shown. Since some subjects reached criterion at this point, not 
all continued on to fiirther trials. The values predicted by the equation 
are given as the straight line drawn through the points. ITie average 
target size in each block was 4.23 cm (the range of the average target 
sizes for different trial blocks was 3.95 to 4.50 cm); the average distance 
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Plate 6.2. Learning curves for pointing devices. 
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to the target was 6.13 cm (range 5.90 to 6.42 cm). 

The parameters T-^ and a, as detennined by the regressions, are given 
in Table 6.1, along with the standard error and squared multiple 
correlation from the regression analysis. Practice causes more 
improvement in the mouse and text keys than on the other two devices. 
The step keys, in particular, show very little improvement with practice. 
Equation 6,2 explains 39% of the variance in the average positioning time 
for a block of trials for the step keys, 61% to 66% of the variance for the 
other devices. The fit, at least for the mouse and the joystick, is actually 
better than these numbers suggest. Since subjects did 30 blocks of trials 
on a day typically followed by a pause of a day or two before they could 
be rescheduled, a break in the learning curve is expected at that point 
and indeed such a break is quite evident for the mouse and the joystick 
between the 30th and 31st blocks. Fitting Equation 6.2 to only die first 
day increases the percentage of variance explained to 91% for the mouse 
and 83% for the joystick. In the case of the step keys and text keys there 
is no such obvious day effect. 

Overall Speed 

In order to compare the devices after learning has nearly reached 
asymptote (as would be the case for office workers using them daily), a 
sample of each subject's performance on each device was examined 
consisting of the last 600 trials excluding the first 200 trials of a day (in 
order to diminish warmup effects). The remaining analyses will be based 
on this subset of the data, excluding those trials on which errors 
occurred. Table 6.2 gives the homing time, positioning time, and total 
time for each device averaging over all the distances and target sizes. 
Homing time was measured from the time the subject's right hand left 
the space bar until the cursor had begun to move. Positioning time was 
measured from when the cursor began to move until the selection button 
had been pressed. From the table, it can be seen that homing time 
increases slightly witli the distance of the device from the keyboard. The 
longest time required is to reach the mouse, the shortest to reach the step 
keys. Although the text keys are near the keyboard, they take almost as 
long to reach as tlie mouse. Either it is more difficult to position the 
hands on the text keys or, as seems likely, subjects often spent some time 
planning the strategy for their move in the time between hitting the 
space bar to start the clock and the time when they begin pressing the 
keys. F'urther evidence for this hypotliesis comes from the relatively high 
standard deviation observed for the homing time of tlie text keys. While 
the differences in tlie homing times among all device pairs except the 
mouse vs. the text keys are reliable statistically (at p < 0.05 or better 
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TABLE 6.1 










Learning Curve Parameters 






DEVICE 


(sec) 


n 


Learning Curve 
Equation* 


(sec) 


R^ 


Mouse 


2.20 


0.13 


r^ = 2.20 N^-^'^ 


0.12 


0.66 


Joystick 


2.19 


0.08 


r^ = 2.19 atoos 


0.08 


0.62 


Step Keys 


3.03 


0.07 


Tj^ = 3.03 ATOO^ 


0.11 


0.39 


text Keys 


3.86 


0.15 


Tj^= 3MN^^^ 


0.16 


0.61 



^ A^ is number of trial blocks. There are 20 trials in each block. 
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TABLE 6.2 
Overall Times 



DEVICE 



Movement time for non-error trials (sec) 
Homing Time Positioning Time Total Time 

M SD M SD M SD 



Error Rate 



M 



SD 



Mouse 


0.36 


0.13 


1.29 


0.42 


1.66 


0.48 


Joystick 


0.26 


0.11 


1-57 


0.54 


1.83 


0.57 


Step Keys 


0.21 


0.30 


2.31 


1.52 


2.51 


1.64 


Text Keys 


0.32 


0.61 


1.95 


1.30 


2.26 


1.70 



13% 



22% 
31% 
33% 
28% 
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using a Mesl), the differences are actually quite small. For example, 
while the step keys can be reached 0.15 sec sooner than the mouse, they 
lake 1.02 sec longer to position. Thus the differences in the homing 
times are insignificant compared to the differences between the 
positioning times. 

The mouse is easily the fastest device, the step keys the slowest. As a 
group, the continuous devices (the mouse and the joystick) are faster 
than the key-operated devices (the step keys and text keys). Differences 
between the devices are all rehable at p < 0.001 using /-tests. 

EJjfect of Distance and Target Size 

The effect of distance on positioning time is given in Plate 6.3. At all 
distances greater than 1 cm, the continuous devices are faster. The 
positioning time for both continuous devices seems to increase 
approximately with the log of the distance. The time for the step keys 
increases rapidly as the distance increases, while the time for the text 
keys increases somewhat less than as the log of the distance, owing to the 
existence of keys for moving relatively large distances with a single 
stroke. Again the mouse is the fastest device, and its advantage increases 
with distance. 

Plate 6.4 shows the effect of target size on positioning time. The 
positioning time for both the mouse and the joystick decreases with the 
log of the target size. The time for the text keys is independent of target 
size and the positioning time for the step keys also decreases roughly 
with the log of the target size. Again the mouse is the fastest device, and 
again the continuous devices as a group are faster for all target sizes. 

Ejjfect of Approach Angle 

The targets in text editing are rectangles often quite a bit wider than 
they are high. Hence they might present a different problem when 
approached from different angles. In addition, the step keys and text 
keys work somewhat differently when moving horizontally Uian when 
moving vertically. To test if the direction of approach has an effect on 
positioning time, the target movements were classified according to 
whether they were vertical (0 to 22.5 degrees) diagonal (22.5 degrees to 
67.5 degrees) or horizontal (67.5 degrees to 90 degrees). Analysis of 
variance shows the angle makes a significant difference in every case 
except for the mouse. 'Hie joystick takes slightly longer to position when 
the target is approached diagonally, llic step keys lake longer when 
approached horizontally than when approached vertically, a consequence 
probably deriving from the fact that a single keystroke would move Uie 
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Plate 6.4. Effect of target size on portioning time. 
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cursor almost twice as far vertically as horizontally. By contrast, the text 
keys take longer to position vertically, reflecting the presence of the 
WORD key. The differences induced by direction are not of great 
consequence, however. For the joystick it amounts to 3% of the mean 
positioning time; for the step keys 9%; for the text keys 5%. 

Errors 

Of the four devices tested, the mouse had the lowest overall error 
rate, 5%; the step keys had the highest, 13%. The differences are reliable 
at /? < 0.05 or better using /-tests. There is only a very slight increase in 
error rate with distance. However, there is a decrease in error rate with 
target size for every device except the text keys (Plate 6.5). This finding 
replicates the result of Fitts and Radford (1966). In an investigation of 
self-initiated, discrete, pointing movements using a stylus, there was a 
similar marked reduction in errors as the target increased in size, but 
only a slight increase in error rate as the distance to the target increased. 



6.3 DISCUSSION 

While these empirical results are of direct use in selecting a pointing 
device, it would obviously be of greater benefit if a theoretical account of 
the results could be made. For one thing, the need for some 
experiments might be obviated; for another, ways of improving pointing 
performance might be suggested. Fortunately, a first-order account for 
the devices of this experiment is not hard to give. 

Mouse 

The time to make a hand movement can be described by a version of 
Fitts's Law (Welford, 1968), 



T = Kq + K log2 (D/S + 0.5) sec (6.3) 



where 



^pos ~ Positioning time, 
D - Distance to the target, 
S - Size of the target, and 
K^ K = constants. 

Here the constant K^ includes witliin it the time for the hand initially to 
adjust its grasp on the mouse and tlie time to make the selection with the 
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Plate 6.5. Effect of target size on error rate. 
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selection button. A constant of /f c:i 0.1 sec/bit (10 bits/sec) appears in 
a large number of studies on movement. This number is a measure of 
the information processing capacity of the eye-hand coordinate system. 
For single, discrete, subject-paced movements, the constant is a little less 
than 0.1 sec/bit. Fitts and Radford (1966) get a value of 0.078 sec/bit 
(12.8 bits/sec, recomputed from their Figure 1, Experiment I, for the 
experimental condition where accuracy is stressed). Pierce and Karlin 
(1957) get maximum rates of 0.085 sec/bit (11.7 bits/sec) in a pointing 
experiment. For continuous movement, repetitive, experimenter-paced 
tasks, such as alternately touching two targets with a stylus or pursuit 
tracking, the constant is slightly above 0.1 sec/bit. Elkind and Sprague 
(1961) get maximum rates of 0.135 sec/bit (7.4 bits/sec) for a pursuit 
tracking task. Fitts's original dotting experiment as replotted by Welford 
(1968, p. 148) gives a K of 0.120 sec/bit as does Welford's own study 
using the actual distance between the dots, the same measure of distance 
used in this study. 

Fitts's Law predicts that plotting positioning time as a function of 
log2(Z)/5' + 0.5) should give a straight line. As the solid line in Plate 
6.6 shows, this prediction is confirmed. Furthermore, the slope of the 
line K should be in the neighborhood of 0.1 sec/bit. Again the 
prediction is confirmed. The equation for the line in Plate 6.6 as 
determined by regression analysis is 

"^pos = 1'^^ + ^-^^^ ^^^2 ^^^^ + ^'^^ ^^^- (^•'*) 

The equation has a standard error of 0.07 sec and explains 83% of the 
variance of the means for each condition. This is roughly comparable to 
the percentage of variance explained by Fitts and Radford. The slope of 
0.096 sec/bit is in the 0.1 sec/bit range found in other studies. Since the 
standard error of estimate for K is 0.008 sec/bit, the mouse would seem 
to be close to, but slightly slower than, the optimal rate of around 0.08 
sec/bit observed for the stylus and for finger pointing. 

The values for positioning time obtained in this experiment are 
apparently in good agreement with those obtained by English et al. 
Making the assumption that their CRT characters were about the same 
width as ours and assuming an intermediate target distance of about 8 
cm. Equation 6.4 (plus the addition of the 0.36 sec homing time from 
Table 6.2) predicts 1.87 sec for 1 character targets (English et al. got 1.93 
sec) and 1.66 sec for "word" targets of 5 characters (they got 1.68 sec). 
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Joystick 

Although it is a rate-controlled device instead of a position device, we 
might wonder if the joystick follows Fitts's Law. Plotting the average 
time per positioning for each distance X size cell of the experiment 
according to Equation 6.3 shows that there is an approximate fit to 

'^pos = ^-^^ + ^-220 log2 (D/S + 0.5). (6.5) 

Equation 6.5 has a standard error of 0.13 sec and explains 89% of the 
variance of the means. The size of the slope K shows that information is 
being processed at only half the speed as with the mouse and 
significantly below the maximum rate. Closer examination gives some 
insight into the difficulty. The points for the joystick in Plate 6.6 
actually form a series of parallel lines, one for each distance, each with a 
slope of around 0.1 sec/bit. Setting AT to 0.1 sec/bit, we can therefore 
write as an alternative model 

^pos = ^D + 0.1 log2 (D/S + 0.5). 

Kjj is the intercept for distance D. From the figure, K^y is about 1.05 
sec for Z) = 1 cm, 1.12 sec for 2 cm, 1.26 sec for 4 cm, 1.44 sec for 8 
cm, and 1.68 sec for 16 cm. For this model the standard error of the fit 
is reduced to 0.07 sec, the same as for the mouse. (Since the slope was 
not determined by the regression, a comparable R^ cannot be computed.) 
Thus the tested joystick can be thought of as a Fitts's Law device with a 
slope twice that for hand movements; or it can be thought of as a Fitts's 
Law device with the expected slope, but having an intercept which 
increases with distance. The problem with this joystick is probably 
related to the non-linearity in the control (Poulton, 1974; Craik and 
Vince, 1963). It should be noted that for the 1 cm distance (where the 
effect of non-linearity is slight) the positioning time is virtually the same 
as for the mouse. Thus the possibility of designing a joystick with 
nerformance characteristics comparable to the mouse is by no means 
excluded. 

Step Keys 

As a first approximation one might expect the time to use the step 
keys to be governed by the number of keystrokes which must be used to 
move the cursor to the target. Since the keys can only move the cursor 
vertically or horizontally, the number of keystrokes is D^/0A56 + 
DyO.246, where Z)j^. and D^ are tlie horizontal and vertical components 
of distance to the target; 0.456 cm is tlie size of a vertical step and 0.246 
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cm is the size of a horizontal step. Hence positioning time should be 
"^pos = ^0 + C'(^/0.456 + Z)/0.246). (6.6) 

'Fhis equation with K^ = 1,20 sec and C = 0.052 sec/keystroke has a 
standard error of 0.54 sec and explains 84% of the variance of the means. 
Since the tapping rate is around 0.15 sec/keystroke, C is much too 
fast to be identified with the pressing of a key. It is also too fast to be 
identified with the 0.067 sec/keystroke automatic repetition mode. Plate 
6.7 shows positioning time plotted against the predicted number of 
keystrokes. The long solid hne is Equation 6.6 with the above 
parameters. The figure shows that positioning time is linear with the 
number of keystrokes until the predicted number of keystrokes becomes 
large (that is, the distance to the target is long). In these cases the user 
often has the opportunity to reduce positioning time by using the HOME 
key. 

Fitting Equation 6.6 to the first part of the graph (D^/QAS6 + 
DyO.246 < 40) gives 

Tp^^ = 0.98 + 0.074 (Z)/0.456 + i)^/0.246). 

The equation, indicated as a short solid line on the figure, has a standard 
error of 0.18 sec and explains 95% of the variance in the means. The 
reasonable slope of 0.074 sec/keystroke shows that the 0.067 
sec/keystroke automatic repetition feature was heavily used. 

Text Keys 

The text keys present the user on most trials with a choice of methods 
to reach the target. For example, he might press the PARAGRAPH key 
repeatedly until the cursor has moved to the paragraph containing the 
target paragraph. He could then press the LINE key repeatedly until it 
is on the target line, then use the WORD key to bring it over to the 
target. Or he might use the PARAGRAPH key to move to the 
paragraph after the target, then holding, the REVERSE key down, use 
tlie LINE key to back up to the line after the target line. And finally, 
using REVERSE and WORD, back up until he hits the target. In fact, 
tliere are 26 different methods for moving the cursor to the target, 
although only a subset will be possible in a given situation, llie fastest 
method will depend on where the target is located relative to the starting 
position and the boundaries of surrounding lines and paragraphs. 

A reasonable hypothesis would be that positioning time is 
proportional to the number of keystrokes and that for well practiced 
subjects the number of keystrokes will be the minimum necessary. To 
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test this hypothesis each trial was analyzed to determine the minimum 
number of keystrokes A^^ .^ necessary to hit the target. The average 
positioning time as a function of N^.^ is plotted as the open circles in 
Plate 6.7. A least squares fit gives 

T = 0.66 + 0.209 N.„. 
pos mm 

The standard error is 0.24 sec and the equation explains 89% of the 
variance of the means. The keystroke rate of 0.209 sec/keystroke is very 
reasonable, being approximately equal to the typing rate for random 
words (Devoe, 1967). Evidently, the automatic repetition mode was little 
used. Examination of some statistics on the minimum numbers of 
keystrokes for each trial shovi^s there was little need for it. For one thing, 
an average of only six keystrokes was necessary for the text keys to locate 
a target word. Ten or fewer keystrokes were sufficient for over 90% of 
the targets. For another, these keystrokes were distributed across several 
keys, further limiting opportunities to use the repetition mode. The 
PARAGRAPH key was needed on 48% on the trials, the LINE key on 
85%, the word key on 83%, and the REVERSE key on 81%. 

Comparison of Devices 

Table 6.3 summarizes the models, the standard errors of the fit, and 
the percentage of variance between the means explained by the model. 

The match of the Fitts's Law slope to the roughly A!" c=: 0.1 sec/bit 
constant observed in other hand movement and manual control studies 
means that positioning time is apparently limited by central information 
processing capacities of the eye-hand guidance system (cf. Welford, 1968; 
Glencross, 1977). Taking K — 0.08 sec/bit as the most likely minimum 
value for a similar movement task, and ^o = 1 sec as a typical value 
observed in this experiment, it would seem unlikely that a continuous 
movement device could be developed whose positioning time is less than 
1 + 0.08 \og^D/S + 0.5) sec (unless it can somehow reduce the 
information which must be centrally processed), although something 
might be done to reduce the value of K^. If this is true, then an optimal 
device would be expected to be no more than about 5% faster than the 
mouse in the extreme case of 1 character targets 16 cm distant (1 + 
0.095 logp[(16/l) + 0.5J = 1.38 sec vs. 1 + 0.08 \o%^{\(i/\) + 0.5] = 
1.32 sec). Typical differences would be much less. By comparison in 
this same case, the joystick (in this experiment) is 83% slower than the 
optimal device, the text keys 107% slower, and the step keys 239% 
slower. Even if A'^^ were zero, the mouse would still be only 23% slower 
than liie minimum. While devices might be built which improve on the 
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TABLE 6.3 



Summary of Models for Positioning Time (r^J 



Device 



Model (times in sec) 



R^ 



Mouse r^^ = 1.03 + 0.096 logj {D/S + 0.5) 

Joystick T^^ = 0.99 + 0.220 logj {D/S + 0.5)^ 

'^pos = ^/) + 01 ^og2 {D/S + 0.5)^ 
Step Keys T^^ = 1.20 + 0.052 {D/S^ + DyS/ 

Tpos = 0.98 + 0.074 {D/S^ + D/S/ 
Text Keys T^s = ^-^^ + ^-^^ ^min 

^ Least squares fit to all data points. 

^ Fitting a separate line with slope .1 sec/bit for each distance. 
^ Least squares fit to all data points. 
^ Fit for number of keystrokes {D/S^ + D/S) < 40, 
where HOME key unlikely to be used. 



0.07 


0.83 


0.13 


0.89 


0.07 


— 


0.54 


0.84 


0.18 


0.95 


0.24 


0.89 
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mouse's homing time, error rate, or ability for fine movement, it is 
unlikely their positioning times will be significantly faster. 

This maximum information processing capacity probably explains the 
lack of any significant difference in positioning time between the lightpen 
and the lightgun in Goodwin's experiment. Both are probably Fitts's 
Law devices, so both can be expected to have the same maximum 0.1 
sec/bit rate as the mouse (if they are optimized with respect to 
control/display ratio and any other relevant variables). 

In interpreting these results, highly favorable to the mouse, some 
qualifications are in order. Of the four devices, the mouse is clearly the 
most "compatible" for this task (cf. Poulton, 1974; Chapter 16), meaning 
less mental translation is needed to map intended motion of the cursor 
into motor movement of the hands than for the other devices. Thus it 
would be expected to be easier to use, put lower cognitive load on the 
user, and have lower error rates. There are, however, limits to its 
compatibility. Inexperienced users are oft:en bewildered about what to 
do when they run the mouse into the side of the keyboard trying to 
move the cursor across the screen. They need to be told that their mice 
can simply be picked up and deposited at a more convenient place on 
the table without affecting the cursor. Even experienced users are 
surprised at the results when they hold their mice backwards or sideways. 

The greatest difficulty with the mouse for text-editing occurs with 
small targets. Punctuation marks such as a period are considerably 
smaller than an average character. The error rate for the mouse, which 
was already up to 9% for one character targets, would be even higher for 
these sorts of targets. 



6.4 SUMMARY AND CONCLUSION 



Of the four devices tested the mouse is clearly the superior device for 
text selection on a CRT: 

1. The positioning time of the mouse is significantly faster than that of 
the other devices. ITiis is true overall and at every distance and size 
combination save for single character targets. 

2. ITie error rate of the mouse is significantly lower than that of the 
other devices. 

?i. llie rale of movement of the mouse is nearly maximal with respect 
to the information processing capabilities of the eye-hand guidance 
system. 



140 CHAPTER 6 

As a group the continuous movement devices are superior in both 
speed and error-rate. For the continuous movement devices, positioning 
time is given by Fitts's Law. For the key devices it is proportional to 
the number of keystrokes. 



Simulating User Performance 
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This chapter carries the GOMS theory of performance in the 
manuscript editing task three steps farther. First, it extends the analysis 
to behavior with Editor Y, a CRT display-based editor. Second, it 
formalizes the model into a running computer simulation program. 
Third, it introduces stochastic decision making into the model in order to 
predict the distribution of decisions the user will make and the 
distributions of times it will take him to perfomi the modifications. 
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7.1 ANALYSIS OF THE TASK ENVIRONMENT 

Within broad limits, the user will seek to behave in rational ways 
constrained by the structure of his environment, lliis is the Principle of 
Limited Rationality (Newell and Simon, 1972). To predict the behavior 
of a person doing a task, it suggests, one must first analyze the task 
environment to discover the environment's constraints. In the following 
analysis, these constraints are reflected: first, in the limited ways in which 
the different objects in the environment can interact; second, in the 
structure of the space of editing tasks considered. These two classes of 
environmental influences will now be discussed. A third classes of 
reflections of the task environment show up in the structure of the model 
of the user. They will be considered in a later section. 

Transactions 

A user of Editor Y sits before a CRT display with a keyboard for 
character input and a mouse for pointing. There are four main entities 
in this environment, (1) the marked up paper manuscript, (2) the user, 
(3) the computer editing system with its keyboard and mouse, and (4) the 
display. We can describe the environment in terms of a set of 
transactions between these entities (see Plate 7.1). The user consults the 
manuscript to find the next task; he seeks from it more information 
about the task under way. The user issues commands to the editor. The 
editor changes the display. The user consults the display for the location 
of a piece of text. Two of these entities, the user and the editor, are 
active, able to initiate transactions. Two entities, the manuscript and the 
display, are passive, speaking only when spoken to. 

By listing the transactions available between entities in the 
environment, we can give a brief description of those entities. This 
description is not intended describe them completely, but only those 
aspects which affect the structure of the model. 

User -*• Editor Transactions 

{R.) 
(D:) 

(J: (is #Task)) 
{Key: (is #Texi)) 
(Scroll: (is #Task)) 
(Select:) 
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User 



Display 




Ptale 7.1. Tramaclions between entities in the task environinenL 
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These transactions represent schematically the commands available on 
the editor. /; is the insertion command. It represents the initiation of 
the command (by typing the letter "I" in Editor Y) and the termination 
of the command (by typing < ESC > in Editor Y). The typing of text to 
be inserted is represented by the Key: transaction. R: is the replace 
command, D: the deletion command. /,• the jump command repositions 
tlie display with the first line to match a typed-in string at the top. 
Scroll: repositions the display so that a location pointed to with the 
mouse is at the top or the bottom. Select: instructs the system to mark 
some piece of text for selection. Hitting one button on the mouse selects 
the character nearest the cursor, hitting another selects the nearest word. 
Segments of text are selected by pointing to the beginning, pressing a 
button, pointing to the end, pressing another button. The expression 
(Scroll: (is #Task)) means generically "Scroll down to some task". 

Editor -* Display Transactions 

(RepositionTo: (is ^Task)) 

Whenever scrolling or jumps occur the editor repositions the text on 
the display. 

User — ► Display Transactions 

(ReadLocation: (is ^Task)) 

-> (is #MainPart) or 
(is T^NearBottom) or 
(is #Offscreeri) 

This is how looking at the display screen to find the task is described. 
Knowing the task for which he is looking, the user looks at the face of 
the display. We describe tliis as sending a message ReadLocation to the 
display. The display makes a reply giving the location of the task. The 
exact location on the screen is of little use in predicting his perfomiance, 
but what is important is whether the task is in the main part of the 
screen, the bottom part, or not on the screen at all. Notice that changes 
in the location of the task will be reflected in changes to the internal 
state of the display entity alone. ITie user may have an internal memory 
of where the task was tliat may or may not correspond witli the display's 
state. 

User -♦ Manuscript Transactions 

(TurnPage: (is # Direction)) 

-* OK or 
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NoMorePages 

{ReadNextLocation: (is ^ Task)) 

-* (is #Task) or 

NoMoreTasksThisPage 
(ReadTargetType: {is #Task)) 

-* InsertionPoint ox 
{is # Character) or 
(is #Work) or 
(is ^Textseg) 
(ReadAttribute: (is #Task) 

(is # Task Attribute)) 
-* (is # Attribute) 

The user can turn the pages of a manuscript forward or backward. If 
he tries to turn over the last page, he discovers immediately, of course, 
that there are no more pages. A user can look for the task on a page 
which follows a certain task, he can note what sort of target it is he must 
select, or he can read various attributes of a task, such as the new text 
that is to be inserted. In modeling, the manuscript, the user, the editor, 
and the display will each be treated as a separate process with all 
interaction occurring through only these transactions. 

The Space of Tasks 

Now we shall consider how to describe the space of editing tasks in 
the domain of the system. Consider a fragment of a typical marked up 
manuscript (Plate 7.2). For the task labeled A2, the instructions marked 
on tlie manuscript indicate the character "a" is to be inserted as a word 
between "involved" and "necessary." Symbolically we might describe 
this task as a tree 



= (.^Insertion 




(Function: 


Insert) 


(InsertionPoint: 


InsertionPoint!) 


(NewText: 


Character I) 
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(IJneNo: 
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(Previous: 
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Chapter I: INTRODUCTION 

While the official, chartered purpose of this Subcommittee on Data Base 
Management systems is to investigate the potential for standardization in the 
area of data base management systems, a necessary first step of the work of 
the Subcommittee has been the development of a set of requirements for 
effective data base management s ystems. The se requirements have emerged 
as the work of the Subcommittetwianifested ^^roce eded an d haye) themselves 
in the form of a generahzed model for the description of data base 
management systems. As no existing or proposed implementation of a data 
base management system completely satisfies these requirements nor 
comprises all of the concepts involved, Jlec£sar]^JalScussi(M arsxanaiads is 
an explanation of this model. The bulk of this Report provides such an 
explanation. f\. 

As a preliminary, it is ""**"■" «->'.«-^ *^ ^^ft^aiir wh;^^ ^■iKtiili liiil In lln 
r^'T'"'"*'''" of Tlli'i '^nninnnl*' Among the responsibilities of the 
Specifications Planning and Requirements Task Force of the Ad Hoc 
Marketing Committee for Computers and Information Processing is the 
generation of recommendations for action by the parent Task Force on 
appropriate areas for the initiation of specifications development efforts. For 
some time, starting in about 1969, the task force has been aware that data 
base management systems are becoming central elements of information 
processing systems, and that there is less than full agreement in the 
community on appropriate design. In addition to the existence of a number 
of implementations of such systems, a list that continues to grow, there are 
several documents generated out of the collective wisdom of some segment 
of the information processing commiunity which are either proposals for 
specific systems (SMITH 1971) or more general statements of requirements 
(JAYME 1970). (HO 1971). As is well known, there is a debate in the 
community on whether existing and proposed implementations meet the 
indicated requirements, or whether the requirements as drawn are all really 
necessary and ' Cinifcly i uatfur ^ Further, there have been serious questions 
about the economics of systems meeting all the stated requirements. 
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Plate 7.2. Sample page of manuscript 
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The expression may be read "A2 is an instance of an insertion with a 
Function Insert and an InsertionPoint InsertionPoint2 and ...." We shall 
limit our consideration to five of the most common instruction types: (1) 
instructions to insert some new text, (2) instructions to delete some old 
text, (3) instructions to replace some old text by some new text, (4) 
instructions to move some text somewhere, and (5) instructions to 
transpose two adjacent pieces of text. The page in Plate 7.2 contains four 
of these types. These instructions, together with the definition of their 
subparts, define a space of editing task. Formally that definition is given 
in Table 7.1. The symbolic representation for Uie editing modifications 
in Plate 7.2 is given in Table 7.2. 



7.2 MODEL OF THE USER 

The model to be proposed is an extention of the model of Chapter 4. 
In order to extend the model to a display-oriented editor the behavior of 
four users was videotaped and analyzed. One of these was used for 
detailed study, the other three mainly for comparison. 

Method 

Four users were videotaped using Editor Y on two standard 
manuscript editing tasks. All used the system daily in their normal work. 
Two were programmers and two were non-programmers. One of the 
programmers and one of tlie non-programmers were fast typists (86 and 
73 words per minute), the other two slow typists (39 and 36 words per 
minute). The protocol chosen for detailed analysis belonged to one of 
the fast typists. 

ITic users were presented with two marked up manuscripts, each 
containing 33 tasks. There were five types of tasks: insertions, deletions, 
replacements, moves, and transpositions. Text arguments were 
approximately 1, 4, 16, 64, or 512 characters long and were varied in 
their boundaries, whether within a word, on a word boundary, within a 
line, across two lines, or across two pages. 
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TABLE 7.1 
Description of the Space of Editing Tasks within Capability of Model 



#7a.vA 
isOneOf 



^WordlnMs (Location:) 
is A #Word 



{is reinsertion) 


hasParts 


{{Location: 


{is 


#PlaceInMs))) ; 


{is ,e Replacement) 










{is #Move) 




#CharacterlnMs {Location:) 






{is # Transposition)) ; 


isA 


# Character 










hasParts 


{{Location: 


(is 


^PlacelnMs))) ; 


^liasivTask {TaskNo: RelTaskNo 


: LineNo: Function:) 










hasParts {{TaskNo: 


{is #Atom)) 


^TextseglnMs 


{Location:) 






{RelTaskNo: 


{is # Integer)) 


isA 


#Textseg 






{LineNo: 


{is # Integer)) 


hasParts 


{{StartLoc: 


(is 


#PlaceInMs)) 


{Function: 


{is # EditFunction))) ; 




{EndLoc: 


(is 


#PlaceInMs))) ; 


^Deletion {Function: OldText:) 




#Text 








isA #BasicTask 


. 


isOneOf 


{{is #Word) 






hasParts {{Function: 


Delete) 




{is # Character) 




{OldText: 


{is #TextInMs))) ; 




{is #Textseg)) 


' 




^Insertion {Function: InsertionPoint: NewText:) 


#Word (TextType: Boundary: Length:) 




isA #BasicTask 




isA 


#Text 






hasParts {{Function: 


Insert) 


hasParts 


{{TextType: 


Word) 


{InsertionPoint: 






{Boundary: 


Word) 




{is #PlaceInMs)) 




{Length: 


(is 


# Integer))) ; 


{NewText: 


{is #Text))) : 











y(^ Replacement (Function: NewText: OldText:) 

is A ^BasicTask 

hasParts {(Function: Replace) 

(NewText: (is ^Text)) 

(OldText: (is ^TextlnMs))) 

#Move (Function: OldText: InsertionPoint:) 
is A #BasicTask 

hasParts ({Function: Move) 

(OldText: (is Jf^TextlnMs)) 

{ InsertionPoint: 

(is ^PlacelnMs))) 

i^Transposition (Function: LeftText: RightText:) 
isA ^BasicTask 

hasParts ({Function: Transpose) 

(LeftText: (is #TextInMs)) 

(RightText: (is #TextInMs))) 



^TextlnMs () 
IsOneOf 



((is 

(is 

JJL 



^WordlnMs) 

# Character I nMs) 

# TextscglnMs)) 



i^Character (TextType: Boundary: Length:) 
is A #Text 

hasParts (( TextType: Character) 

(Boundary: (is #CharacterBoundary)) 

(Length: I)) ; 



^Textseg (TextType: Boundary:) 
is A #Text 

hasParts {{TextType: 

(Length: 
(BouTuIary: 



Textseg) 

(is ^Integer)) 

( is # TextsegBouruIary))) 



^CharacterBoundary () 

isOneOf (InWord Word) ; 

#TextsegBuundary () 

isOneOf (Line SplitLines SplitPages) 



^Bounds (Start: End:) 
hasParts {{Start: 

(End: 



(is #PlaceInMs)) 
(is ^PlacelnMs))) 
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TABLE 7.2 




Symbolic Representation of Manuscrif 
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InsertionPoint!) 
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8) 
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A2))\ 




SplitLines)) ; 


TextseglnMsl 
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Outline of the Model 

According to the model of Chapter 4, a user may be represented as a 
4-tuple <G, O, M, S>, where 

G is a set of goals defining a state of affairs to be achieved, 

O is a set of operators which specify the possible set of 
changes to the user's memory or to the task environment, 

M is a set of methods, that is; sequences of goals and 
operators associated with the intended achievement of 
particular goals, 

S is a set of selection rules for choosing among the multiple 
methods which may be applicable to a goal. 

For the present model the elements of these sets are enumerated in 
Table 7.3. In that table and in what follows we shall follow the 
convention that items beginning with a * are pieces of information that 
must at some time be kept in the user's short term memory. 

Goals. The goals of Table 7.3 form a subgoal hierarchy as pictured in 
Figure 7.3, The user is assumed to have an overall goal EditMs to edit 
the manuscript. Since the instructions written on the manuscript are 
broken down into individual tasks, he will also generate a set of subgoals 
EditTask. In order to do the editing required for one of these tasks, 
however, he must first find out what the task is by setting up the goal 
GetTask. The result of GetTask is one of the goals Replace, Delete, 
Insert, or Move. These goals are qualified by some reference to the new 
text to be inserted (*NewText), the text to be deleted i*OldTextKey), or 
the place where new text is to go (*InsertionPointKey). The precise form 
in which the user has these pieces of knowledge represented in his head 
is not specified. 

In order to select segments of text, the user employs SelectTarget, 
which uses the subgoals PointToTarget and PointThere to handle the 
details of pointing. 

Operators. The user is assumed to have available to him a certain 
number of elementary behavioral acts. Two of these, GetFromMs and 
GetFromDisp have to do with getting information from the environment, 
the manuscript, and the display. Two of the operators describe the users 
actions with tlie mouse. Point is the act of causing the cursor on the 
display to move by means of moving the mouse until it is at a certain 
ScreenPosition described by Key. The argument *Bug? is either "Bug" 
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TABLE 7.3 
GOMS Model for Editor Y 



Goals 

(EditMs) 

. {EditTask) 

(GetTask) 

{Insert *InscrtionPointKey *NewText) 
(Delete *OldTextKey) 
{Replace *OldTextKey *NewTexi) 
{Move *InsertionPointKcy *OldTextKey) 
{SelectTarget *PlaceInMs *WhichTarget *Target) 
(PointToTargct *PlaceInMs *Target *Bug) 
(PointThere *ScreenPosition *TargetType *Bug) 

Operators 

{GetFromMs Desiredlnformation Attribute) 

{GetFromDisp Desiredlnformation Attribute *PlaceJnMs) 

{Scroll Place) 

{Jump Place) 

{Point ScreenPosition Key Bug?) 

(D 

(D) 

(R) 

{Key NewText) 

Methods 

One At A TimeMethod 

GDVMethod 

R eadTasklnMsMethod 

JCommandMethod 

DCommandMethod 

RCommandMethod 

Delete- InsertMethod 

ZcroInMethod 

RoughPointMethod 

CharPointMethod 

WordPointMethod 

TcxtsegPointMethod 

InscrtionPointMethod 

PointWithoutScrollingMethod 

Scroll A ndPointMethod 

JumpMcthod 

Selection Rules 

RoughLocRulc 
TcxtsegRule 
CharPointRule 
WordPointRule 
InsvrtionPointR ule 
Top2/3Rulc 
Bottoml/SRule 
OJJScrci-nRule 
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GetTask 
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I'i&ure 7.1. Hieniidiy of goals for Tjible 7.1 
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meaning push a button on the mouse to signal the text under the cursor 
is selected or "Don't Bug." Scroll means to move the cursor to the edge 
of the screen and push a button causing all the text on the screen to 
move up or down. Jump, I, D, and R mean to issue those commands to 
the editor. Key means to type in a text argument. Verify Edit is to check 
that the task was done correctly. DoTask is the operator which sets up 
as a goal the instruction obtained by GetFromMs. 

Methods. An important part of the model is the means-ends analysis 
which sets out methods for the accomplishment of the various goals. 
Some of these methods will be presented here. Others will be deferred 
until later. 

The method for accomplishing EditMs is simply to proceed one task 
at a time through the manuscript. This method, called the 
OneAtATimeMethod, can be written as follows: 

{One A tA TimeMethod 

(until *NoMorePages do (EditTask))). 

In terms of the simulation we can think of *NoMorePages as a variable. 
If its value is T (if *NoMorePages is true) the loop will exit. In terms of 
the user this device tests for the existence of some knowledge element 
{*NoMorePages TRUE) in short term memory. 

Another example of a method is that used to edit a single task (that 
is, to achieve the goal EditTask). The user first gets the task into 
memory, does it, then, about 40% of the time, verifies his modification to 
be sure it is correct. This method, the GVDMethod, can be written: 



{GDVMethod 
(GetTask) 
(DoTask) 
(with-probability .4 do {VerifyEdit))) 

The method for getting the task can be expressed directly in terms of 
the GetFromMs operator. 

{ReadlnTaskMethod 

{GetFromMs '*Task)) 

llie need to distinguish between a symbol itself and some other symbol 
associated with the first symbol unfortunately leads to the notational 
cncumberance of prefixing single quotation mark to indicate the former, 
if we think of the *'d elements as representing the names of slots in 
short lenn memory, the above expression gives the operator GetFromMs 
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the name of the slot rather than the contents. 

The method for achieving an Insert is shghtly more comphcated. 

( ICommandMethod 

(if ~ *InsertionPointKey 

then {GetFromMs '* InsertionPointKey)) 
(SeleclTarget *PlaceInMs 'InsertionPoint: 

'*InsertionPointKey) 
it) 

(if ~ *NewText then {GetFromMs '*NewText)) 
(if *NewText ~= 'Default then {Key *NewText))) 

If the user does not know where to make the insertion he looks over to 
the manuscript to find out. Then, with the mouse, he selects that place 
and issues the insertion command to the editor. If he cannot remember 
the text to be inserted he consults the manuscript. Finally, except in the 
special "Default" case where the text to be inserted is the argument to a 
previous command (for example the delete command) the user types in 
the new text. The methods for the other commands are similar. 
Selection rules. When more than a single method is available, the 
model uses "SelectionRules" to choose among them. A simple example 
is the goal PointToTarget. In Editor Y, there are at least three major 
alternative metliods for this goal: (1) to select a character, the user 
moves the mouse and presses tlie first button on it; (2) to select a word, 
he moves the mouse and pushes another button; (3) to select a text 
segment, he moves the mouse to point to tlie beginning of a the text 
segment, pushes a button, moves the mouse to point to the end of the 
segment, pushes a button. The corresponding selection rule set may be 
written: 

{CharPointRule 

(if {is :^ Character ^Target) 

then {CHOOSE CharPointMethod))) 

{WordPointRule 

(if {is #Word *Target) 

then {CHOOSE WordPointMethod))) 

{TextsegRule 

(if {is #Textseg *Target) 

then {CHOOSE TextsegPointMethod))) 
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Detailed Studies 

While it was possible, above, to spell out the general outlines of a 
model, the settling of several issues depends on more detailed analysis. 

Reasons for and frequency of GetFromMs. It is easily observed Uiat 
the user often consults the manuscript several times during the course of 
a task. Under what conditions and how frequently will he consult the 
manuscript? And in particular, how can a method be written to describe 
the variable number of consultations the user requires in pointing to a 
target? 

User Si's performance on the sixteen insertion tasks in the two 
manuscripts was examined with respect to the use of the GetFromMs 
operator. ITiere were 40 instances of such an operator. Three different 
pieces of infomiation the user sought to pick up from the manuscript 
could be identified: the location of the task, the operation to be 
performed, and the new text to be inserted. Of course, from a single 
look at the manuscript the user often picked up more than a single piece 
of information. Table 7.4 shows the distribution of reasons inferred for 
looking at the manuscript arranged by the number of characters in the 
new string. Each line in the table is a separate task. Each entry is the 
number of times the user looked at the manuscript for that reason in that 
task. 

For example, on task Al, the user consulted the manuscript once at 
the beginning of the task. Since she proceeded to point at the target and 
then insert the new text without further consultations, she must have 
obtained all of these on the first look. It is therefore inferred that the 
location of the task, the operation to be performed, and the text to be 
inserted were all absorbed in that single consultation. 

But on task A18, she looked once at the beginning of the task, then 
twice more at the manuscript before finally pointing to the target. Then 
she looked again before beginning to type in the new text. While typing 
she glanced back at the keyboard twice. From the first look she 
probably learned tlie approximate location of the task and the operation 
to be performed. On the second look she got another rough location of 
the target insertion point. On the third she learned the exact target 
position and on the fourth glance got the beginning of the text to be 
inserted. At this point she proceeded to type while watching the 
manuscript with small glances back to the display or keyboard to check 
for suspected errors or locate different keys (see Long, 1977). These 
glances arc tallied in the "Type-watching" column. Since this sort of 
GetFromMs overlays the Key operations it will be hereafter ignored. 
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TABLE 7.4 
Reason and Frequency for looking at Manuscript 



Task N chars. 



Next 
Task 
(GNT) 



Reason for Looking 
at Manuscript 



Target 

Location 

(GL) 



New 
Task 
(GN) 



Total Type- 
watching 



Al 










1 




A21 







1 


2 




B8 


1 1 


3 


1 


5 




B26 










1 




A6 




1 





2 




A32 




2 





3 




B2 


4.7 1 








1 




B23 










1 




A3 







1 


2 




A14 




1 


1 


3 


1 


Bl 


18.2 1 


1 


1 


3 




B16 




1 


2 


4 


1 


A18 




2 


1 


4 


2 


B6 


75 1 


1 


1 


3 


2 


A30 




1 


1 


3 


4 


BIO 


522 1 








1 


7 



The procedure for locating a target (GL column) is especially 
interesting. It typically goes as follows. First the user extracts a few 
words from the manuscript to use as a key. Either the words may be the 
exact target or some other words or characters she thinks may be 
heuristically useful. In either case she uses the mouse to point to the 
key. If the key is only a rough approximation to the target she does not 
bug the target, but looks over to the manuscript and repeats the 
procedure. Otherwise she bugs tlie target and moves on to the next step 
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TABLE 7.5 

Comparison of Number of GL Operations in 
Sequence with Poisson Distribution 
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Number of 




Frequency 


GL Operators 
in Sequence 


Observed 


Predicted 
16 (0.81%0^ViV!) 



1 
2 
3 
4 


7 
6 
2 
1 



7.1 
5.8 
2.3 
0.6 
0.1 



of the task. This method of locating the target is here called the 
ZeroInMethod and may be described, 

(ZeroInMethod 

(while (is #RoughKey *Target) 

do (PointToTarget *PlaceInMs *Target 'Don'tBug) 

(GetFromMs '*Target *WhichTarget) 
finally {PointToTarget *PlaceInMs *Target 'Bug))). 

* Target is the identifying key extracted from the manuscript by the user. 
*PlaceInMs represents her memory for which task she is doing. 

* WhichTarget identifies which of several possible targets 

she is considering (for example a move task has an InsertionPoint and an 
OldText). 

Although it has not been possible to decide for any task how many 
times (GetFromMs '*Target) will be invoked in succession (and in an 
engineering analysis, such a prediction would usually need to be done in 
the absence of a particular manuscript) the numbers in Table 7.4 are 
well approximated by Poisson distribution of mean 0.81 (see Table 7.5). 
Hence tlie operator (GetFromMs '*Target) should be constructed so as to 
pick up rough kx:ation keys in such proportion that the number of 
iterations will be Poisson distributed. 
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TABLE 7.6 

Frequency with which Subject 1 Employed Alternative 

Methods for Goal Point There as a Function 

OF Distance of Target from Top of Screen 

Lines from PointWilhoutScrolling ScrollAndPoint Jump 

Top of Screen Method Method Method 



On Screen 


1-4 


6 




5-8 


9 




9-12 


3 




13-16 


2 




17-20 


1 




21-24 




Off Screen 


25-28 
29-32 
33-36 
37-40 





1 

2 

2 1 

1 1 

1 
3 



Scrolling. On a given task, the user might not reposition the screen 
at all. Or, he might "scroll" with the mouse to reposition the text up or 
down by a few lines on the screen (by moving the mouse to the left side 
of the screen and pushing a button the display can be made to jump a 
certain number of lines up or down). Or he might "jump" to a new 
location (by typing the letter J followed by some string of characters the 
text can be repositioned so that the first following instance of these 
characters will be at the top of the display). How can a set of selection 
rules be written which will predict the user's choice? 

On each task the user has a choice of methods for adjusting the 
portion of the text file being displayed. In order to examine the way in 
which the user does scrolling. Si's perfonnance on all of tlie tasks in the 
first manuscript were examined. For each task the number of lines from 
the top of the screen to the target were counted, and whetlier the user 
caused the text to be repositioned in the screen and how was recorded. 

Table 7.6 shows the number of times the user adopted each of these 
methods as a functiow of the distance of the target from the top of the 
screen. The results for this subject may be simply expressed, if the target 
is in the top two-tliirds of the screen, she does not reposition tlie screen; 
if the target is in the bottom tliird of the screen, she scrolls; if tlie target 
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is off the bottom of the screen, she uses the jump command: 

iTop2/3Rule 

(if (is #MainPart *ScreenPosition) 

then (CHOOSE PointWithoutScrollingMethod))) 

(Bottoml/SRule 

(if (is #NearBottom *ScreenPosition) 

then (CHOOSE Scroll AndPointMethod))) 

(OJJScreenRule 

(if (is #OffScreen *ScreenPosition) 
then (CHOOSE JumpMethod))) 

The MainPart of the screen is from 1 to 19 lines, the NearBottom part 
from 20 to 24 lines. All others are OffScreen. 

This set of selection rules has the advantage that it makes clear the 
mechanism whereby the user makes his choice. It has the disadvantage 
that it demands knowing the state of the screen at some arbitrary point 
in the editing process. It would therefore be useful if there were a 
reasonably accurate set of selection rules which did not demand such a 
detailed knowledge of the situation. 

The distance between two targets of the manuscript is easily 
determined by inspection of the manuscript alone. Table 7.7 shows the 
selection in methods as a function Dy, the number of lines from the last 
target. The selection rules are not as precise and make more errors, but 
have the considerable advantage that they can be determined without 
consideration of the details of how the screen will be updated. A set of 
selection rules based on Dy is given below. 

(LittleDyRule 

(if Dy < 16 

then (CHOOSE PointWithoutScrollingMethod))) 

( MediumDyR ule 

(if l^ > Dy < 25 

then (CHOOSE Scroll AndPointMethod))) 

(BigDyRule 

(if Dy > 25 

then (CHOOSE JumpMethod))) 
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TABLE 7.7 

Frequency with which Users Employed Alternative Methods 
for Goal PointThere as a Function of Distance Between Tasks 



Distance from 


SI 




S2 




S3 




S4 


Previous Task 


PWSM^ SAPM^ 


JM'^ 


PWSM J 


5APM 


PWSM SAPM 


PWSM SAPM 





12 




* 11 


2 


13 




12 1 


1 


16 




15 


1 


15 


1 


15 1 


4 


11 2 




10 


3 


8 


5 


7 6 


16 


3 6 


7 


4 


11 


2 


13 


2 13 


32 


7 


7 


1 


6 




7 


3 4 



^ PointWithoutScrollingMethod 
^ ScrollAndPointMethod 
^ JumpMethod 
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How well do these two rules predict the user's method selection? Table 
7.8 gives a number of statistics calculated for each rule. 'ITie hit rate is 
the percentage of time the rule was correct. Tlie x^ statistic can be used 
to test the likelihood that these results could arise from chance. A 
normalized statistic which indexes the goodness-of-fit is the "correlation 
of attributes" r = (x^/Nik - 1))^^^. While the rule set based on the 
position of the target on the screen is better, the other rule set based on 
the distance between targets on the manuscript also makes a good 
showing. 

This result encourages us to use the simpler distance between targets 
measure to examine the behavior of other users to see how stable these 
methods are across users. Table 7.7 shows the frequency with which the 
three other users used the differerent pointing methods. The main 
difference between these users and the first user is that they do not have 
the JumpMethod in their repertoires. All three switch from no movement 
to the ScrollMethod as the distance increases. The cross-over point varies 
from a distance of 4 hnes between targets up to 11 lines. It would thus 
appear that the selection rule set 

(LittleDy2Rule 

(if Dy < S 

then (CHOOSE PointWithoutScrolIingMethod))) 

{BigDylRule 

(if Dy > 8 

then (CHOOSE ScrollAndPointMethod))) 

is a reasonable rule set for those users who do not use the Jump method. 
If the scrolling is the only means employed to move the text on the 
display, then almost independent of the distribution of tasks on the 
manuscript, the amount of scrolhng will be determined by the length of 
the manuscript. From examination of the data from S2, S3, and S4, 
tliere are approximately 16 lines/scroll, hence the number of scrolls a 
user who does not use the Jump or Find command can be expected to do 
is given by 

Total scrolls = 0.07 X Total lines in manuscript. 

Table 7.9 shows the number of lines per scroll computed for individual 
users. Reasonable scrolling behavior may be approximated by having the 
model scroll 16 lines at a time. 
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TABLE 7.8 
Goodness-of-Fit for Alternative Selection Rule Sets for Goal PointThere 





User 


Manuscript 1 




Manuscript 2 




Rule Set 


Hit Rate 


x' 


r 


Hit Rate 


x' 


r 


Topl/3Rule 

Bottoml/3Rule 

OflfScreenRule 


SI 


0.85 


35.90^ 


0.74^ 








BigDyRule 
LittleDyRule 


SI 


0.85 


30.97 


0.68 


0.75 


10.29 


0.40 . 


BigDy2Rule 
LittleDy2Rule 


S2 
S3 
S4 


0.94 
0.94 
0.88 


24.04 
24.92 
17.84 


0.87 
0.87 
0.74 


0.85 
0.85 
0.91 


14.73 
14.73 
21.22 


0.67 
0.67 
0.80 



^ All x^ in table significantly different from 0, p < 0.01. 
^ All r in table significantly different from 0, p < 0.05. 
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TABLE 7.9 
Average Number of Lines per Scroll for Different Users 
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Number of Lines per 


Scroll 




Non-Error 


Error 


All 


User 


Tasks 


Tasks 


Tasks 


SI 


16.3 


15.5 


17.5 


S2 


16.3 


21.5 


17.7 


S3 


12.7 


7.9 


11.6 


S4 


20.2 


11.6 


16.1 


Average 


16.4 


14.1 


15.7 



Statement of the Model 

Now the full model can be stated. The GOMS elements are given in 
Table 7.10. These are stated in a specialized notation which collects 
together methods, and selection rules for a goal. The concepts of the 
model and the full description of the two manuscripts is given in the 
Appendix. 

A trace of the model for task A2 is given in Table 7.11. The 
elements preceeded by a ► symbol are operators. Those preceded by D 
are transactions. Table 7.11 is only one of the possible sequences the 
model predicts for this task. Adopting the abbreviations 



GNT = (GetFromMs *Task) 

GL = {GetFromMs *Target ...) 

GN = (GetFromMs *NewText ...) 

PR = (Point ... DonWug) 

PB = (Point ... Bug) 

and writing down only the sequence of operator firings predicted by the 
model give the more compact version 



GNT PR GL PB I 16 VE. 
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TABLE 7.10 
Methods and Selection Rules for Editor Y 



(GOAL l-dltMs 
(METHODS 

(OncAtATimeMethod 

(until *EndOfloh do (EditUi))))) 
(GOAL EditTask () 
(METHODS 

(GDVMethod (GetTask) 
(DoTask *Task) 

(with-probability .4 do (Fmj5'£"f///))))) 
(GOAL GetTask () 
(METHODS 

(ReadTasklnMsMethod 

(GetFromMs '*Task)))) 
(GOAL Insert (*InsertionPointKey *NewText) 
(METHODS 

(ICommandMethod 

(if ~ *InsertionPointKey then (GetFromMs ' *InsertionPointKey)) 
(SelectTarget *PlaceInMs ' InsertionPoint: *InsertionPointKey) 

(if - *NewText then (GetFromMs '*NewText)) 
(if *NewText ~= 'Default then (Key *NewText))))) 
(GOAL Delete (*OldTextKey) 
(METHODS 

(DCommandMethod. 

(if ~ *OldTextKey then (GetFromMs *OldTextKey)) 
(SelectTarget *PlaceInMs 'OldText: *OldTextKey) 
(D)))) 
(GOAL Replace (*OldTextKey *NewText) 
(METHODS 

(RCommandMethod 

(if ~ *OldTextKey then (GetFromMs ' *OldTextKey)) 
(SelectTarget *PlaceInMs 'OldText: *OldTextKey) 

(R) 

(if ~ *NewText then (GetFromMs '*NewText)) 
(if *NewText -= 'Default then (Key *NewText))))) 
(GOAL \fove (*InsertionPointKey *OldTextKey) 
(METHODS 

(Delete-InsertMethod 

(Delete *OtdTextKey) 
(Insert *InsertionPointKey 'Default)))) 
(GOAL Select laraet (*PlaccInMs' *WhichTarget *Target) 
(METHODS 

(ZeroInMethod 

(while (is ^RoughKcy *Target) 

do (PointToTargct *PlaceInMs *Target 'Don'tBug) 
(GetFromMs '*Target *WhichTarget) 
finally (PointToTargct *PlaccInMs *Target 'Bug))))) 
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TABLE 7.10 

(continued) 



{GOAL roinflolarnet (*PlaccInMs *Target *Bug) 
{SELECTION-RULES 
{RoughLocRule 

(if {is #RoughKey *Targct) then {CHOOSE RoughPointMethod))) 
{TcxtscgRule 

(if (w #Textseg *Target) then {CHOOSE TextsegPointMcthod))) 
(CharPointRule 

(if (w ^Character *Target) then {CHOOSE CharPointMcthod))) 
{ WordPointRule 

(if (« JifWord *Targel) then {CHOOSE WordPointMethod))) 
{ InsertionPointRule 

(if {is #PlaceInMs ^Target) then {CHOOSE InsertionPointMcthod)))) 
{METHODS 

{RoughPointMethod 

{GetFromDisp '*ScreenPosition *Target *PlaceInMs) 
{PointThere *ScreenPosiiion 'Word *Bug)) 
{CharPointMcthod 

{GetFromDisp ' *ScreenPosition {the Location: of *Tcrget) *PlaceInMs) 
{PointThere *ScreenPosition 'Character *Bug)) 
{ WordPointMethod 

{GetFromDisp '*ScreenPosition {the Location: of *Target) *PlaceInMs) 
{PointThere *ScreenPosition 'Word *Bug)) 
{TextsegPointMcthod 

{GetFromDisp '*ScreenPosition {the StartLoc: of Target) *PlaceInMs) 
{PointThere *ScreenPosition 'Character *Bug) 
{GetFromDisp '*ScrcenPosition 'Character *Bug) 
{PointThere *ScrcenPosition 'Character *Bug)) 
{InsertionPointMcthod 

{GetFromDisp ' *ScreenPosition ^Target *PlaceInMs) 
{PointThere *ScreenPosition 'Character *Bug)))) 
{GOAL PointThere {*ScreenPosiliori *TargetType *Bug) 
{SELECTION-R ULES 
{Top2/2Rule 

(if {is ^MainPart *ScrecnPosition) 

then {CHOOSE PointWithoutScrollingMethod))) 
{Bottoml/SRule 

(if (z5 # Near Bottom *ScrecnPositiori) 

then {CHOOSE Scroll AndPointMethod))) 
{OJJScrecnRule 

(if (« #OfjScrecn *ScrecnPosition) 
then {CHOOSE JumpMcthod)))) 
{METHODS 

{ PointWithoutScrollingMethod 

{Point *ScrirriPpsifion *TargctTypc *Bug)) 
{Scroll AndPointMvthod 

{Scroll *PhiceInMs) 

{Point *ScrccnPositiov *TargctTypc *Bug)) 
{JumpMcthod {Jump *PlacrIuMs) 
{Point *Scrccnl'ositit)n *TargctTypc *Bug)))) 
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TABLE 7.11 
Trace of Simulation Model for Sequence 8 of Task A2 

(EditTask) 
The only method is GDVMethod 
Use GDVMethod 
. {GetTask) 

The only method is ReadTasklnMsMethod 
Use ReadTasklnMsMethod 
► {GetFromMs *Task NIL) 

n {Manuscript! {ReadNextLocation: Al)) — ♦ A2 
n (Manuscriptl (ReadAttribute: A2 Function:)) — ♦ Insert 
n {Manuscriptl {ReadAttribute: A2 NewText:)) —*■ Characterl 
{Insert {is #RoughKey) Characterl) 
The only method is ICommandMethod 
Use ICommandMethod 
{SelectTarget A2 InsertionPoint: {is #RoughKey)) 
The only method is ZeroInMethod 
Use ZeroInMethod 

. {PointToTargct A2 {is ^RoughKey) Don'tBug) 
RoughLocRule recommends RoughPointMethod 
Use RoughPointMethod 

► {GetFromDisp *ScreenPosition {is ^RoughKey) A2) 
n {Display {ReadLocation: A2)) — ► {is ^MainPart) 

. {PointThere {is ^MainPart) Word Don'tBug 

Top2/3Rule recommends PointWithoutScrollingMethod 
Use PointWithoutScrollingMethod 
. . ► {Point {is i^MainPart) Word Don'tBug) 
► {GetFromMs *Target InsertionPoint:) 

n {Manuscriptl {ReadAttribute: A2 InsertionPoint:)) 
-* InsertionPoint2 
. {PointToTarget A2 InsertionPoint2 Bug) 

InsertionPointRule recommends InsertionPointMethod 
Use InsertionPointMethod 

► {GetFromDisp *ScreenPosition InsertionPoint2 A2) 
C3 {Display {ReadLocation: A2)) —* {is #MainPart) 

. {PointThere {is ^MainPart) Character Bug) 

Top2/3Rule recommends PointWithoutScrollingMethod 
Use PointWithoutScrollingMethod 
. . ► {Point {is ^MainPart) Character Bug) 

D {Userl SelectionMade:) -*■ NIL 
n {EditorY Bug:) -♦ Ready 



. ► {Key Characterl) 
► {VerifyEdit) 



D {EditorY (/;)) -♦ Ready 

D {EditorY {Key: NewText)) -> NewText 
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By running ihe siniulalion model several times the model can be used to 
make Monte Carlo predictions (Table 7.12) of: 

1. the set of possible operator sequences the user will employ to 
do an editing task; 

2. the relative frequency with which the different operator 
sequences will be employed; 

3. the distribution of time for each of these sequences; (The 
table gives the means, standard deviation, 5 th and 95th 
percentile limits.) 

4. and finally, the distribution of times for the whole task. 

By combining the predictions for each task, the model can be used to 
make predictions for an entire manuscript. 

Estimation of Parameters 

In order to make time predictions with the model it is necessary to 
make numerical estimates of several of its parameters. Because a model 
such as this one is to be used to predict user behavior in advance, and 
since there is no established methodology for optimizing a simulation 
model with such a large set of parameters, we do not seek that set of 
parameters which will optimize the fit of the model to the data. Rather, 
we attempt to make reasonable estimates of the parameters in advance, 
then test the predictions of the model against experimental evidence to 
see how well its predictions fared. 

ITie estimates for the parameters are summarized in Table 7.13. The 
times for operators GetFromMs and TurnPage is taken from Chapter 4. 
The time to Point with the mouse is taken ft^om Chapter 6. Since the 
data in that study were pointing times in isolation, the measurements 
were confinned by comparing with measurements of SI in the present 
editing task. ITie time for ^i/ging with tlie mouse is set at one reaction 
time, 0.3 sec. Each of the commands /, R, and D is assumed to take 
about the same time for command inv(x:ation, plus additional time for 
typing in arguments. ITie command invocation is estimated to be like 
doing two SPECIFY-CMD operations in POHT, since there is the 
command name to be typed and an < HSC > character at the end. 'ITie 
lime for Verifyluiit is based on the time previously measured for Kditor 
Y (based on 12 measurements). Keystroke time is based on an average 
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TABLE 7.12 

Predicted Task Sequences and Simulation Statistics for Task A2 

(TIMES IN sec) 



Seq No Sequence Freq Mean SD 5th-%tile 95th-%tile 



1 


GNT PB I GNl 


2 


GNT PB 1 1 


3 


GNT PR GL PB I GNl 


4 


GNT PB 11 V 


5 


GNT PR GL PB 11 


6 


GNT PR GL PR GL PB I GN 1 


7 


GNT PB I GNl V 


8 


GNT PRGLPBIIV 


9 


GNT PR GL PR GL PB 1 1 


10 


GNT PR GL PB I GN 1 V 


11 


GNT PR GL PR GL PR GL PB 


12 


GNT PR GL PR GL PB 11 V 


13 


GNT PR GL PR GL PR GL PB 



17 


8.1 


1.9 


5.5 


12.8 


15 


5.2 


1.5 


3.1 


7.4 


11 


11.5 


2.5 


8.2 


17.4 


10 


7.0 


2.4 


4.5 


12.1 


9 


8.1 


1.4 


6.0 


9.8 


8 


15.6 


2.7 


11.7 


20.1 


7 


7.7 


.9 


7.0 


9.6 


6 


9.9 


3.3 


7.4 


14.2 


5 


12.0 


3.3 


9.2 


17.1 


5 


11.5 


2.0 


9.3 


14.2 


2 


17.2 


1.8 


15.9 


18.5 


2 


12.5 


5.2 


8.8 


16.1 


2 


17.6 


.8 


17.0 


18.1 



/I V 

71 
14 GNT PR GL PR GL PB I GN 1 V 1 13.1 - 13.1 13.1 

Overall 100 9.5 3.9 4.3 17.4 
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TABLE 7.13 
Parameter Estimates for Model 



Parameter 



Estimated Time (Sec) 
Mean SD 



Source 



User Parameters 






GetFromMs 


2.1 


0.9 


TurnPage 


2.1 


1.4 


Scroll 


2.6 


1.4 


Point 


1.7 


1.3 


Bug 


0.3 


0.2 


I D, R 


0.8 


0.6 


VerifyEdit 


1.1 


1.0 


KeystrokeTime 


0.127 


0.064 


System Parameters 






/. A R 


1.1 


0.4 


J 


1.0 


1.0 


Scroll 


1.7 


1.2 



Card, Moran, and Newell (1976), Figure 5.2 

Card. Moran, and Newell (1976), Figure 5.2 

Measurement of 10 instances 

Card, English, Burr (1977), Table 1 

1 reaction time 

Card, Moran, and Newell (1976), Figure 5.2 

(two SPECIFY.CMD's) 

Measurement of 12 instances 

Average of two typing tests, SD = 0.5 Mean 

Measured response time of 25 instances 
Measured response time of 10 instances 
Measured response time of 10 instances 
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of two typing tests embedded in an editing exercize given to the user 
before the start of the experiment as a warmup. Tlie standard deviation 
for tlie keystroke time is estimated by multiplying the mean time per 
keystroke by a typical coefficient of variation for typing of 0.5 (Kinkaid, 
1975). 

In order to estimate response time of the system, 25 /;, Z);, or R: 
command invocations were measured. Since there was no obvious 
difference among the times taken by these commands, their measured 
times were pooled to give a common estimated time. Ten invocations of 
/; commands and ten of Scrolls: were also measured. These parameter 
estimates are necessarily sketchy, given the labor of obtaining them. But, 
in an exploratory model they suffice to give us a feeling for how well the 
model does in its prediction. Engineering use of such a model would 
most likely use equally rough parameter estimates. 



7.3 PREDICTIONS OF MODEL 

The model makes predictions for the sequence of operations to be 
used, the frequencies with which these sequences occur, the time for 
editing a manuscript, and the standard deviation of editing times. 
Comparison of these predictions with available data indicates that the 
model is about as good as running another subject. 

Prediction of Operator Sequences 

In order to predict the operator sequences for Manuscriptl, 100 
Monte Carlo runs were made for each task in the manuscript. A detailed 
analysis of the operators actually used by user SI for the tasks in this 
manuscript was completed by hand from the videotapes. As illustrated 
in Table 7.12, the model predicts several different sequences which might 
be used by tlie user. Since on a given trial the user might have used any 
one of these sequences, the predicted sequence closest to the observed 
sequence was selected. The accuracy with which die model predicted the 
sequences was dc^termined by first examining how well the closest 
sequence matched, and then checking whether these sequences occurred 
with the predicted frequency. 

There is no accepted statistic with which to summarize goodness-of- 
match, so this was assessed in several ways. The simplest method is to 
just note how many sequences were matched exactly. For tasks done 
correctly by the user, the model produced an exact match half of Uie 
time. Tasks in which the user made an error were never matched exactly 
by the model. 
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Anolhcr method by which goodness-of-malch can be assessed is to 
ignore the sequence properties and correlate the frequency with which 
operators are predicted to occur against the frequency with which they 
do occur. For each task the correlation between the sequence observed 
for SI and the (best matching) predicted sequence was computed (see 
Table 7.14). 

ITie model predicts well the frequencies of operators for tasks Uie 
user perfonned correctly. ITie average correlation is 0.94. All of the 
correlations for these tasks are significantly different from at /? < 0.01. 

ITie model does not attempt special predictions for the case in which 
the user makes an error. It is therefore appropriate (and reassuring) that 
the average correlation falls to 0.61 and that only half are significantly 
different from at the 0.01 level. 

A third way by which goodness-of-match may be assessed is to count 
the number of insertions, deletions, and transpositions necessary to 
transfonn the predicted sequence into the observed sequence. TTiis 
method makes use of the sequence properties of the prediction. The 
column labeled "Sequence" in Table 7.14 gives the nearest predicted 
sequence of operators for each task and the operations necessary to 
transform the predicted sequence into the closest observed one. For 
example, the closest simulation sequence predicted for the replacement 
task A31 was 



GNT S PB PB R GN 19 
But the observed sequence was 

GNT J GL PB PB GN R 19 

(The user used Hie JumpMethod instead of the ScrollMethod predicted 
and also got the new text from the manuscript before beginning the 
replace command, rather than after.) 'llie differences between these two 
sequences are expressed 



GNT {S){J GL] PB PB R^GN 19. 

'ITie rounded brackets () mean "delete the enclosed elements," {] means 
"insert the enclosed elements'" and ^ means "transpose the adjacent 
elements." 

The distance of one string from another can be indexed by computing 
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TABLE 7.14 

GOODNESS-OF-FiT FOR PREDICTED SEQUENCES BEST MATCHING SEQUENCES OBSERVED FOR USER SI 



Task 



Sequence 



Structure Comparisons 


Time 
Pred 


>(sec) 


r 


ii 


Rank 


Prob 


Obs 


1 





5 


0.47 


9.6 


5.9 


1 





5.5 


0.40 


9.6 


13.7 


1 





14 


0.02 


21.3 


15.9 


1 





4 


0.40 


11.2 


13.0 


1 





2.5 


0.60 


11.3 


22.7 


1 





7 


0.05 


22.8 


13.8 


0.92 


0.23 


6 


0.12 


25.2 


35.0 


0.03 


1.44 


4 


0.35 


79.6 


141.7 


0.54 


0.86 


9.5 


0.22 


13.3 


12.6 


0.75 


1.00 


6 


0.35 


11.2 


17.1 


0.91 


0.43 


1 


1.00 


12.7 


20.1 


1 





5 


0.34 


7.3 


17.6 


0.81 


0.67 


3 


0.55 


11.7 


13.5 


0.72 


0.71 


1 


1.00 


16.2 


26.3 


0.93 


0.40 


4 


0.35 


28.7 


26.2 


0.54 


1.00 


6 


0.05 


17.1 


11.8 


1 





3 


0.53 


8.9 


13.1 


1 





4 


0.25 


14.4 


12.7 


0.97 


0.20 


8 


0.02 


18.2 


21.5 


0.95 


0.25 


6.5 


0.14 


19.3 


22.6 


1 





1 


1.00 


11.2 


16.2 


0.73 


0.67 


6.5 


0.06 


24.2 


42.2 


0.92 


0.50 


2.5 


0.66 


19.6 


17.4 


0.70 


0.88 


4 


0.32 


14.2 


33.7 


0.89 


1.25 


3.5 


0.56 


14.2 


19.6 


0.83 


0.45 


4 


0.35 


19.7 


35.4 


0.85 


0.83 


5 


0.20 


23.5 


14.3 


1 





6 


0.11 


19.7 


64.2 



Insertion 

A21 GNT PR GTPB II 

A2 GNT PR GT PB I\ 

A6 TP GNT S PR GT PR GT PB 1 4 V 

A32 GNT PR GT PB 15 V 

A3 GNT PB I GN 19 V 

A14 GNT PR GT PR GT PR GT PB J GN 20 V 

A18 GNT (J) PR GT [S] PR GT PR GT PB I GN 11 V 

*A30 TP GNT {PR){S\ GT [S\ PB I GN 510 V[PB PB RIV S S\ 

Replacement 

*A27 GNT S [GT\ PB (R) GN [I\ I V 

*A15 GNT S [Gl] PB R1V[PB D V] 

A8 GNT J [GT\ PB PB R^GN 4 

A17 GNT PB R4 V 

Aid GNT PB PB {R) GN [I] 17 

A31 GNT {S)[J GT\ PB PB R^ GN 19 

A4 GNT S {PK} GT PB [GT\ PB R GN 19 V 

Deletion 

*A12 GNT [J] PR GT {PR)[PB] GT PB D [PB GN I] V 

A23 TP GNT PB D V 

A19 TP GNT PR GT PR GT PB PB D V 

A24 GNT PR GT PR GT PR GT PB [PB] D V 

A5 GNT [S] PR GT PR GT PB PB D 

A28 GNT PR GT PB PB D 

A29 GNT (S PR)[GT J\ GT {PR)[A GT PR GT PB PB D V 

Transposition 

A20 GNT J [GT\ PR GT PB (PR GT) D (PB) I V 

*A7 GNT PR GT PB D [D] (PB) I [I S] V 



Move 

*A11 
*A1 
A16 
AlO 



GNT PR GT PB [PB] D PB I [I 1 V GT PB PB D] V 
GNT PR GT PB PB [PB] D {PR GT) PB I V 
GNT PR GT PR GT PB [PB] D (PR GT PB) I V [PB D I] 
GNT PR GT PR GT PR GT PB PB D PB I V 
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/^ = (/ + i) + Sjj, + T)/S f.^^^, 

where 

/ = Number of operators inserted 
D = Number of operators deleted 
Sjj^ = Number of non-contiguous sites 

where insertions or deletions occur. 
T = Number of transpositions. 

This index counts the number of changes per predicted operator firing 
necessary to transform the predicted sequence into the actual sequence. 
For correct tasks /^ has a fairly reasonable value of 0.23, about one 
change every four operators. For error tasks, however, /i = 0.98 or one 
change for each operator. 

There is no agreed threshold for these statistics that will certify the 
sequence predictions of the model. About the best that can be said is all 
the statistics seem to agree and indicate that the model makes good 
predictions for user Si's error- free performance. All agree and indicate 
the sequences predicted by the model are not descriptive of Si's 
performance for tasks on which she makes errors. 

How well does the model predict the frequencies with which the 
alternative sequences are chosen? The column labelled "Prob" in Table 
7.14 records the proportion of sequences occurring in the simulation with 
frequency less than or equal to the matched sequence. If we divide the 
tabulated value into quintiles, an equal number of matches should fall 
into each group. For the correct tasks, this is the case (Kolmogorov- 
Smimov Z){21) = .22, A^^" p > 0.20). For error tasks, there are more 
sequences in the low probability quintiles than would be expected by 
chance (Z)(7) = 0.46, p < 0.05). The model satisfactorally predicts 
sequence frequencies for STs correct tasks: it does not predict sequence 
frequencies for her error tasks. 

Prediction of Time Distributions 

If the prediction of the sequences is acceptable, how well does the 
model fare in predicting the time necessary to pcrfomi the task? It 
should be renicinbered that these predictions are essentially zero 
paraniclcr predictions. The first obvious comparison to make is between 
liic observed times for SI and the predicted limes of the best match 
sequences of Table 7.14. 'Hiis comparison is made in tlic colunm labeled 
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"BestMatch" of Table 7.15. As can be seen from the table, the predicted 
average time per task for correct tasks is about 43% too low. The root 
mean square of the error (RMSE) is about 36% of the average time/task. 
The correlation between predicted times and observed times is quite low 
(r = 0.32) for correct tasks, but very high (r = 0.99) for error tasks. 
The very high correlation for error tasks results partially because two 
tasks with a large number of characters to be typed are error tasks. 
Actually, in absolute terms, the error tasks are predicted less well than 
the correct tasks as can be seen from the larger RMSE for error tasks, 
some 48% of the mean predicted time. 

Tlie next comparison to make is between the times predicted by the 
model using the mean time for each operator. This comparison appears 
in the "Var Off. Surprisingly these predictions do just as well if not 
slightly better (in the sense that they have a higher correlation and lower 
RMSE) than the "best match" predictions. 

Finally the predictions of the model using gamma-distributed random 
numbers with means and variances as given by the model parameters are 
compared with Si's observed performance in the column labeled "Var 
On." The main point of this model is to attempt a prediction of the 
distribution of task times. Since data was not available for SI 
performing the same set of tasks several times, the model was used to 
predict the standard deviation of all the tasks taken together. Using the 
operator variances causes the model to increase its prediction of the 
standard deviation of the combinded set of tasks about 50%, to a value 
closer to but still only half of that observed for correct tasks. 

One way of puting these predictions into perspective is to compare 
the time predictions with the predictions that could be made by simply 
timing how long it takes another user to do the same task. This 
comparison is made in the last three columns. Roughly, the model does 
about as well as experimentally running another user. 



/.4 SUMMARY 

The model is a reasonably good predictor of the sequence of 
operators for tasks done correctly by the user. 

It exactly predicted the sequence half the time. 

The predicted frequencies of operators correlated 0.94 with those 
observed. 

The sequences occurred with tlie predicted frequencies. 
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TABLE 7.15 
Comparison of Observed Editing Time by Subject SI with Predictors 



Observed 


Model Prediction 


Compared to Other Users 




Si 


Best 


Var Var 


S2 


S3 S4 








Match 


Off On 












Correct Tasks (N = 21) 
















Mean time/task (sec) 


22.6 


15.8 


14.9 


14.6 


19.0 


13.7 


55.5 


SD time/task (sec) 


13.8 


6.1 


4.6 


6.6 


17.3 


18.4 


62.5 


RMSE (sec) 




14.5 


13.8 


14.7 


10.4 


13.2 


63.5 


r 




0.32 


0.57 


0.42 


0.82 


0.73 


0.70 


Error Tasks (N = l) 
















Mean time/task (sec) 


38.9 


23.2 


24.3 


23.3 


49.4 


26.7 


69.2 


SD time/task (sec) 


46.3 


24.9 


23.9 


26.5 


95.9 


42.0 


99.2 


RMSE (sec) 




26.2 


25.7 


26.3 


48.8 


15.0 


58.6 


r 




0.98 


0.99 


0.98 


0.98 


0.98 


0.98 


All Tasks (N = 2%) 
















Mean time/task (sec) 


26.7 


17.6 


17.3 


16.7 


29.9 


17.0 


61.2 


SD time/task (sec) 


25.9 


13.3 


12.6 


19.9 


50.3 


22.9 


71.4 


RMSE (sec) 




18.2 


17.5 


18.3 


26.4 


13.7 


62.3 


r 




0.86 


0.92 


0.89 


0.94 


0.94 


0.81 
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On the average, it was necessary to make one adjustment to the 
sequence for each four operator firings. 

It does not try to predict the sequence of operators which will ensue 
after an error and these error sequences are significantly different from 
the predicted sequences. 

ITie mean time per task is predicted by the model to within about 50- 
60% of the observed values. 

The observed standard deviation is about twice what the model 
predicted. 

The root mean square of the error deviations between predicted an 
observed values is about 50-60% of mean time for a task. This compares 
to the roughly 30% obtained" for models of the POET editor. 

The simulation model predicts the times roughly as well as running a 
human subject. 

These results are for essentially zero parameter predictions and with 
rather sketchy data. The fit of times could probably be improved 
somewhat by better quality data. For example the parameters based on 
the POET editor and other sorts of Editor Y tasks apparently 
underestimated the variance in this situation. On the other hand such 
simulations face limitations in their reliance on the summation rule for 
accumulating times, and other techniques for estimating times should be 
compared against these results to ensure there is sufficient gain from the 
simulations to repay their costs. 
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8 
Conclusion 



8.1 Findings 

8.2 Routine cognitive skill 

The internal mechanisms of the user 
The principle of limited rationality 
Relation to problem solving 
Relation to other skilled behavior 



What can we predict about , the behavior of the user in the 
manuscript editing task? What will be his sequence of actions? How 
long will it take him to perform them? What sorts of errors will he 
make? What do we now know about user performance in editing 
systems? Let us review the major findings of the thesis. 



8.1 FINDINGS 

ITie main findings of the thesis are classified according to topic in Table 
8.1. The six studies described in the preceding chapters fall into a 
cyclical pattern of basic research and application. 



[BASIC RESEARCH] 

I. What performance differences are there between editors? 

(Chapter 2; Benchmark study of editing systems) 
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Significant speed differences (up to about a factor of 3) exist between 
editors as a consequence of the number of keystrokes/ task required and the 
use of a page display. 

2A. When editors were compared by determining the times required 
by expert users to perform benchmark tasks: Letter Typing, 
Manuscript Editing, Table Typing, and Text Assembly, there 
was a greater difference among editors in the time to perform 
the Manuscript Editing benchmark than in the time to perform 
the other benchmarks. 

2B. The RCG editing system (7 sec/modification) is faster than 
Editor Y (10 sec/mod), which is faster than TECO (11 
sec/mod), which is faster than POET (17 sec/mod). All were 
much faster than using a typewriter. 

2C. The ratio between the time required to edit the manuscript in 
Chapter 2 with the slowest and the fastest editor is 2.3. Making 
allowances for the likely existence of slightly better and slightly 
worse editors, it is probably correct to say the design of the 
editor can make a difference of a factor of 3 in time to edit a 
manuscript. 

2D. There was a difference of about a factor of 1.2 between the best 
and the worst expert user within an editor. 

2E. Teletype-like editors took twice as long as display-oriented 
editors (even though compared on CRT output devices of the 
same speed). 

2F. On the benchmark task, errors increased the average 
time/modification by 9% (Range to 24%). 

2G. The speed of an editor is largely a function of the number of 
keystrokes required to do a task. 



[APPLICATION] 

II. Can a priori analysis yield significant predictions about 
editing performance? 

(Chapter 3; Application: Predicting When an Editing System is Belter 
Than a Typewriter) 
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// was possible on the basis of a simple a priori analysis, using data 
determined in Chapter 2, to predict the (surprising) outcome of a user 
experiment on text-editing. 

3A. The time to type a text and the time to edit a text with 
WYLBUR are well fit with simple linear models. 

3B. From these simple theories, the length of text at which it 
becomes faster to use an editor can be derived, as well as the 
density of modifications per line of text at which it becomes 
faster to retype the text instead of editing it. 

3C. For a WYLBUR-like (POET-like) editor the modification 
crossover point is 0.7 modifications/line, hence if there is more 
than one modification every 1/0.7 = 1.4 lines, it is better to 
retype the text from scratch than to edit it. 

3D, A user should be able to type about 16 new words in the time 
necessary to make a correction with WYLBUR, hence it is 
better to type at slow speed making few errors than to type at 
high speed with the intention of making corrections during a 
second pass. 

3E. Most of the prediction error of the Constant Time per 
Modification model arose from errors in setting the parameters 
rather than from errors in the form of the model. 

3F. Taylor approximations can be used to linearize the formulae for 
expected editing time in such a way that the sensitivity of 
predictions to error in input parameters can be determined and 
so that the blame for prediction errors can be localized. 



[BASIC RESEARCH] 

Ml. How is editing behavior organized? What effect does the 
grain of analysis have on the quality of the model? 

(Chapter 4; ITie GOMS Model of Manuscript Editing) 

Editing behavior is organized into cyclical operator sequences ("unit task 
cycles"). It can be described in terms of Goals. Operators. Methods, and 
Selection Rules. Refining the grain size of a model does not seem to affect 
much its predictive ability. 
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4 A. User behavior in the manuscript editing task can be modeled in 
terms of a small set of Goals, Operators, Methods, and Selection 
Rules (GOMS). ITiese elements can be used to produce models 
that predict the sequence of operators and the time to perform a 
modification. 

4B. GOMS models can be written at several grain sizes: Constant 
time/modification models, task step models, argument-level 
models, and movement-level models. 

4C. The GOMS model for POET did not improve in its abihty to 
predict performance (of a single user) on a new task after the 
grain size was reduced below the 4 sec level (task step model). 

4D. With two or three selection rules it is possible to predict method 
choices with 80-90% accuracy. 

4E. The time for a given individual modification can be predicted 
by a GOMS model to within 20-35%. 

4F. Errors (for one subject using POET) increased editing time by 
25%. 

4G. Individual differences were found among users in their 
preferences among methods of locating the target line. 



[APPLICATION] 

IV. Hovy^ can rough performance measures be estimated for a 
system when it is in the conceptual design stage? 

(Chapter 5; Application: Predicting User Performance at an Early Stage 
in System Design) 

The unit task pattern discovered in Chapter 4 suggests a technique: 
enumerate the number of unit tasks and estimate the number of these 
required for a task. 

5A. Since user performance on many editor-like systems is organized 
as a set of unit task cycles, the unit tasks of a system can be 
used at design time to predict the amount of time various user 
activities on tlie system can be expected to take. 
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5B. In order to carry through the above predictions, data on the 
relative number of various features per page were collected 
(Table 5.2). 

5C. Using the number of unit tasks per task, the number of tasks 
per page, and the time per unit task it was possible to estimate 
the user time per page for different system configurations 
without knowing the details of the system. 

5D. While unit tasks do interact with each other, attempting to 
capture this interaction by analysis at the next level of detail 
changed the results by only about 6%. 



[BASIC RESEARCH] 

V. Which is the best pointing device and why? 

(Chapter 6; Evaluation of Mouse, Rate-controlled Isometric Joystick, Step 
Keys, and Text Keys for Text Selection on a CRT) 

The mouse is fastest and most error free of the tested devices. 

6A. The mouse (average positioning time 1.7 sec) is faster than the 
measured joystick (1.8 sec) is faster than the text keys (2.3 sec) 
is faster than the step keys (2.5 sec). 

6B. The time to position a cursor with a key device (step keys or 
text keys) is proportional to the number of keystrokes necessary. 

6C. The time to position a cursor with an analog pointing device 
(mouse or joystick) is proportional to logj {Distance/ Target-size 
+ 0.5). 

6D. The time to position a cursor with the mouse is 

T =1 + 0.1 log2 (Distance/ Target-size + 0.5) sec 

6E. The time to position the joystick measured was 

T = 1 + 0.2 log-, (Distance/ Target-size 4- 0.5) sec. 

6F. 'llie time to position the step keys is 
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where (D/S^ + ^V^J is the number of keystrokes to the 
target. 

6G. The time to position the text keys is 

T =07 + 21 A^. 

where jV^^.^ is the minimum number of keystrokes to the target. 

6H. The speed with which pointing can be done with the mouse is 
close to the theoretical minimum. 



[BASIC RESEARCH! 

VI. How can the results be integrated into a simulation? 
How can stochastic uncertainty be added to the model of 
Chapter 4? Can the model be extended to a display editor? 

(Chapter 7; Using GOMS to Simulate User Performance on a Display 
Editor) 

A running computer simulation program was constructed incorporating 
many of the results of the thesis. This program, to date, is about as good 
at predicting editing performance as running another user. 

7A. A GOMS model can be constructed and implemented down to 
the detail necessary for a running computer simulation for a 
display-based editor (Editor Y). 

7B. The space of simple editing tasks is as given in Figure 6.3. 



7C. The number of times a user looks at the manuscript before 
selecting a target with his pointing device was Poisson- 

VJl.1. \^1.1V/ UOV-l 111 x^viitv^i X /• 






7D. Users will scroll when the target is in the bottom third of the 
screen. 

7E. Using two or three selection rules it was possible to predict the 
method for pointing at a target about 85% of llie time. 

7F. Users were quite similar in their choice of when to scroll. The 
main difference in screen movement methods found was 
whether or not tlie user ever used the jump command. 
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7G, Predictions of a GOMS simulation are about as good as 
predictions from examining tlie perforaiance of another user in 
the same task. 

7H. The simulator predicted sequences quite well (correlation between 
predicted and observed operator frequencies = 0.94; on the 
average, it was necessary to make one adjustment to the 
sequence for each four firings). 

71. The simulator predicted the mean time per task to within about 
50-60% of observed values. 

7J. The standard deviation of the time to do a task tasks was about 
twice what the model predicted. 

In Table 8.1, the Basic Research- Application cycle is broken down 
still further. Some findings from the basic studies contribute 
measurements of constants such as performance time/operator. Other 
findings represent the results at theory building exercises. Finally, some 
findings have to do with attempts to verify theories and hypotheses. 
Table 8.1 shows that all are represented in the thesis. 

The other side of the table lists three classes of applications: results 
can be used (1) to calculate predictions of behavior, (2) to make design 
decisions, and (3) to evaluate systems. The thesis has gone only so far 
as to make some sample applications of the results obtained. 



8.2 ROUTINE COGNITIVE SKILL 

Having laid out the results of our investigation into the manuscript 
editing task, it is time to ask how behavior in this task is related to, on 
the one hand, problem solving behavior and, on the other hand, skilled 
perfonnance. Ultimately, the issue can be stated: how does the 
behavior of a user in the manuscript editing task arise from the nature 
of the user and his environment? The question is best discussed with 
respect to fundamental notions of man as symbol processing system. 

The Internal Mechanisms of the User 

Figure 8.1 depicts a model of the internal mechanisms of the user. 
Hie model comes from a wide range of basic experiments in 
psychology. Immediate sources for the figure are Newell and Simon 
(1972. Figure 4.1), Wclford (1976, Figure 1.1), Sheridan Ferrell (1974, 
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TABLE 8.1 
Classified Table of Findings 



TOPIC 



BASIC RESEARCH 



APPLICATION 







Data Base 


Theory 


Verification 


Calculation 


Design 


Evaluation 


I. 


PERFORMANCE PARAMETERS 
















Time 


2A,2B,2C,2D 


2F,2H,3A 


3E,4E,7G 


3E,3D,3C,5C 


5A 


2A,28 






2E,2F.2G,4F 


3F,4A 


. 


3A 


- 


- 




Time Distribution 


- 


7J 


7J 


- 


- 


. 




Errors 


2G,4F 


- 


- 


3D,3C 


- 


. 




Operator Sequences 


- 


4A,7H 


7H 




. 


. 




Method Choices 


- 


4D,7E 


4D,7E 




- 


- 


II. 


EDITORS 
















Comparisons 


2B,2C,2E 




. 






2A,28 




WILBUR Editor 


- 


- 


3A 






- 




POET Editor 


2G,4H 


40 


4C,4D,4E 


30,30 




28 




TECO Editor 


2B 


- 


,- 






28 




Editor Y 


28,70,70 


7C,7E,7J 


7E,7G,7J 


50 




28 






- 


7A 


. 






. 




RCG Editor 


28 


. 


. 






28 




Typewriter 


28 


- 


- 






28 


Ill 


POINTING DEVICES 
















Mouse 


58 


6C,6D,6I 
6A 


7E 




- 


- 




Joystick 


6A 


6C,6E 


. 




. 


- 




Text Keys 


6A 


68,6H 


- 




- 


- 




Step Keys 


6A 


6B,6Q 


- 




- 


• 


IV. 


INDIVIDUAL DIFFERENCES 


2A,7F,4H 


- 


- 




- 


- 


V. 


MODELS 
















Task Analysis 


58 


4A,7B 


7G,7A 




5A 






Model Grain 


- 


48 


40 




- 






Constant Time/UT Mods 


- 


2H,3F,4B 


3E,3A,4E 


50,38,30 


5A 






UT Step-Level 


- 


48 


4E 


50 


- 






Average-Level 


70 


48 


4F 




. 






Movement-Level 


. 


48 
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Figure 1.2), and results of the present thesis. The user has a single long 
term memory (LTM) of unlimited size which holds essentially 
permanent knowledge. Tliere is also a short term memory (STM) of 
very limited size which holds the knowledge of the current task context; 
this represents the user's focus of attention. The user has organs of 
sensation with their associated buffer memories and some perceptual 
processing mechanism capable of rendering the sensations into symbols 
and depositing them in the short term memory. He also has motor 
organs and a motor control mechanism capable of executing limited 
motor programs. 

The three major divisions of the central mechanisms deal 
respectively with perception, cognition, and the controlling of action 
(Welford, 1968; Welford 1976). Knowledge flows into the short term 
memory from both the external environment of the user and from his 
long term memory. Both of these flows are determined or moderated 
by the knowledge in the short term memory at each instant. Similarly, 
the user's initiation and control of external (motor) actions and of the 
addition of new knowledge to his long term memory is determined by 
the contents of working memory. 

The perceptual component and the motor control component of the 
user are each capable of acting with a limited parallelism with the 
cognitive component. Thus a user may touch type, the perceptual 
mechanisms staying a word or two ahead of the motor control 
mechanism (Shaffer, 1976). Thus a child arranging a set of blocks by 
height may reach for the next block even as he is deciding where to put 
it (Young, 1976). 

Tlie cognitive component may be thought of as working with three 
logical processes: GET, SELECT-METHOD, and EXECUTE. These 
processes relate the goals, operators, methods, selection rules, and 
knowledge elements which describe the task, all in long term memory, 
to the instantaneous state of the task as represented in short term 
memory. 

The Principle of Limited Rationality 

The behavior of the user is the result of his attempt to achieve his 
goals by rationally acting in accord with the imperatives of the task 
environment (Simon, 1947; Simon, 1969; Newell and Simon, 1972). 
Thus, to change a word in the manuscript (goal) he employs the replace 
command (rational behavior), typing in the new word indicated on the 
manuscript (task environment). The behaviour of the user is also the 
result of limitations in his ability to behave rationally. To edit the 
manuscript (goal) he does not first read all of the indicated 
modifications at once then proceed to implement tliem (rational 
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behavior) because he cannot remember them (hmitation). 

Both the ralionahty of the user and the Hmits on that rationahty are 
apparent in the model of the user in Figure 8.1. The rationahty of the 
user is located in the analysis of the task environment that is embedded 
in the goals, operators, methods, and selection rules stored in long term 
memory. The limits of the user are in the form of processing or size 
limits on the parts of the figure. The principal ones may be expressed 
in very approximate numbers: 
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Several of the results for this thesis can be derived directly from this 
simple view of the human information processing architecture: 

(1) The user can do one main (cognitive) thing at a 
time — what the working memory determines at each 
instant — although he can overlap very limited amounts of 
perceptual and motor processing. 

(2) His working memory is limited and overloading the 
working memory causes some immediate knowledge to 
be lost. Thus the user needs to consult the manuscript 
several times in the course of a modification. 

(3) His knowledge in long term memory can only be 
retrieved if it can be accessed via the knowledge already 
in working memory. Tlius, complete failures to recall 
relevant knowledge are possible, as are retrievals of 
inappropriate knowledge because the retrieval clues were 
inadequate. 

(4) Adding knowledge lo his long term memory lakes much 
longer than retrieving knowledge. ITie user is forced to 
rely on his limited working memory to cope wiUi rapidly 
changing task data. 
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(5) The information that the user encounters in the outside 
world must be encoded if it is to become usable and this 
can be done only in terms of the knowledge already 
available in the long term memory. Thus, the user can 
only learn new things in terms of knowledge he already 
has. 

(6) The time/operator for a model at the individual user 
movement level (Model M0.5 in Chapter 4) can be 
expected to be on the order of 0.5 sec, the shortest time 
for a perceptual and motor cycle. 

(7) The time to perform a keystroke is just the minimum 
time for a motor movement — around 0.2 sec. 

(8) The movement times for the pointing devices cannot 
exceed about 0.2 sec/keystroke for the key devices or 0.1 
sec/bit for the continuous movement devices. 

Relation to Problem Solving 

Tliat the picture of the human processor is just basically the same 
picture that emerges from the study of problem solving can be seen by 
comparing Figure 8.1 to the description of a problem solver redrawn 
from Figure 4.1 of Newell and Simon (1972) in Figure 8.2. In both the 
human processor works by selecting and then applying methods from 
long term memory as controlled by the current contents of short term 
memory. Since the task in problem solving experiments is typically to 
solve a single problem rather than a series of tasks likein skilled 
performance, there is no GET in Figure 8.2. Also in Figure 8.1 the 
process CHANGE REPRESENTATION would be considered as just 
anotlier method. The perceptual and motor control mechanisms are 
represented in Figure 8.1 in more detail, reflecting the important role 
these play in skilled performance. 

Tlie difference between manuscript editing and problem solving is 
that in the former there is no problem space, no search. The methods 
are almost certain of success. A skilled performance, like manuscript 
editing, is thus a special case of problem solving. 

Of course, whether a user will exhibit problem solving or skilled 
performance depends not only on the task but on the state of the user 
as well. A nice example of the transition between one and the other 
can be seen in Simon and Reed's (1976) experiment on the five 
missionaries/ five cannibals problem. The problem space for this task is 
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given in Figure 8.3. The subjects' searches through the problem space 
were characterized by two strategies: a "balance" strategy which 
emphasized balancing the number of missionaries and cannibals on each 
side of the river and a "means-ends" strategy which emphasized ferrying 
the maximum number of persons across the river. Subjects initially 
used the (defective) balance strategy and later shifted to the means-ends 
strategy. When subjects were retested on this problem after a successful 
solution, they shifted earlier to the successful strategy. In other words, 
as a result of practice, part of the searching behavior was eliminated and 
replaced by the selection of and application of methods more certain of 
success. Presumably after a number of tries at the problem, they would 
embrace the means-ends strategy from the start and follow the route 
marked on the figure. All search would be eliminated. While Simon and 
Reed's subjects exhibited problem solving behavior on their first trial, 
by their second attempt at the problem they had already begun to shifl 
toward skilled performance. 

Relation to other skilled behavior 

Manuscript editing, as performed by the users studied in this thesis, 
is clearly an example of skilled performance, characterized as it is by 
"competent, expert, rapid, and accurate performance" (Welford,1968; p. 
12). To emphasize the user's mental engagement in the task, the 
unruffled way in which each new problem can be paired with an old 
and trusty method, and the close relationship of the manuscript editing 
task to many other tasks of this sort in daily life, we shall speak of the 
manuscript editing as a routine cognitive skill. 

Although it is not possible to state a general theory of routine 
cognitive skill at this point, it is possible to note some ways in which 
manuscript editing appears to be specialized, and thus to indicate where 
the general theory might be significantly different from the one 
presented here. Dimensions on which routine cognitive skills can vary 
are listed below: 

(1) Unit task structure: Manuscript editing is structured into a 
sequence of almost totally independent unit tasks, each a few 
seconds duration. This provides an extremely short time horizon 
for the integration of behavior. There are routine cognitive skill 
tasks with even shorter unit tasks, such as making change. More 
often, the relevant task duration is much longer, such as 
cataloging a book for a library. However, it is most important to 
note that many routine cognitive skills such as typing (Shaffer, 
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1976) do not have any unit task structure at all, but are essentially 
continuous activities. 

(2) Activity role of the task environment: In manuscript editing, the 
task environment is not passive, as in the task of writing a 
business letter by hand, but responsive to the user's actions. Yet 
neither does task environment initiate new action. By contrast, in 
the task of taking airline travel reservations over the phone, the 
task environment is the major agent of initiation. 

(3) Payoff characteristics: The usual trade-off between speed and 
accuracy is reflected in this dimension: consider the difference 
between typing a rough draft and typing a final copy. Both speed 
and accuracy are important in manuscript editing, with absolute 
priority given to accuracy. The emphasis on accuracy plus the 
detectability of errors leads to the conversion of errors to time. 

(4) Input to LTM: With respect to knowledge in long-term memory, 
a distinction may be drawn between memory for the components 
of the skill itself, and memory for the objects in the environment 
being processed by the skill. It is this latter to which we refer 
here. In the game of bridge, the task specific knowledge players 
must input into LTM is fairly large, for they must be able to 
remember all the cards played. There is little long-term task 
specific information to remember in manuscript editing other than 
the assimilated knowledge of the skill of the specific editor. 

(5) Retrieval from LTM: A security guard must draw on LTM 
frequently and rapidly to recognize many faces, whereas the 
infoimation needed by an elevator operator is provided by the 
task environment. Modest amounts of information need to be 
retrieved from LTM for manuscript editing consisting largely of 
infonnation about the editing system and small amounts of 
information about the specifics of the editing task. 

(6) Type of cognitive activity: In manuscript editing the primary type 
of cognitive activity is interpretation: interpretation of the 
instructions of the manuscript as commands and interpretation of 
the feedback from the editor in terms of successfijl performance 
of the corrections to be made. There are no fornis of routine 
reasoning, design, or evaluation activity in manuscript editing. In 
contrast, the electrician's task of installing a new fixture may 
require considerable routine inference to discover the connections 
for circuits only partially seen. (What differentiates the 



CHAPTER 8 193 

electrician's task from problem solving is that the problems are 
familiar and the solution schemata previously stored, e.g., in the 
National Electrical Code). 

(7) Degree of motor involvement: This dimension lies along the major 
discrimination of motor skills from cognitive skills. Motor 
involvement in many cases implies the lightening of the cognitive 
load, since the motor activity may not demand too much cognitive 
involvement. Dictating to a secretary is much more cognitively 
intense than composing on a typewriter, where the cognitive 
activity may pause periodically while the motor activity catches 
up. In the manuscript editing task, 40% of S13's time was spent in 
motor operations. 

(8) Requirement for inventing plans: The issue here is whether a goal 
hierarchy (created from a limited stock of goal types) is sufficient 
to control behavior or whether new plans have to be constructed 
and then interpreted. Simple manuscript editing, such as we have 
been discussing, requires no planning; the goal structures provides 
adequate control. But if tlie editing task were made more 
complex — say, by requiring shuffling of manuscript sections — then 
planning would be required to decide on a correct and efficient 
order in which to make the changes. 

(9) Conditionality: Conditionality is the extent to which the operators 
recombine in different sequences. At one extreme are the single, 
repetitive, assembly-line sequences studied by industrial engineers. 
At the other end of the scale is the short-order cook, 
simultaneously processing a random mixture of pancakes, eggs, 
and home-fries. His behavior is highly conditional, deriving from 
the large number of different combinations of orders that can 
come even from a small menu. Manuscript editing sits in the 
middle, having only a moderate amount of conditionality, 
expressed mainly in the selection of methods and in the optional 
choices of operators within methods. 
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NOTES 

1 It is commonly assumed that LTM is very large, large enough so 
that it's physical capacity is not a limitation on performance. 

2 Klatzky, 1975. 

3 Identified with the time necessary to do the letter name matching in 
the Posner letter task (Posner, Boies, Eichelman, and Taylor, 1969). 

4 Dansereau, 1968. 

5 Welford, 1960. 
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